Working Groups/Science/swat4ls hackathon/lightning

David Shotton
David gave a presentation on MIIDI, a Minimal Information standard for reporting an Infectious Disease Investigation, that was developed following his semantic enhancement work on a paper by [http://dx.doi.org/10.1371/journal.pntd.0000228.x001 Reis et al. (2008)] describing the risk of contracting the disease leptospirosis (Weil's Disease) in Salvador, Brazil. Leptospirosis is a bacterial spirochete disease transmitted by rat urine in urban slums in conditions of poor sanitation and hygiene, particularly during seasonal heavy rainfall that leads to flooding of open sewers. David explained how the original summary of the Reis et al. paper, that he had hand crafted, inspired the development of MIIDI, and how Tanya Gray had converted the original draft of MIIDI into a formal XML data model, and had created two tools, an online tool and a Java tool, both based on the MIIDI model, to enable the entry of rich metadata concerning an infectious diseases investigations. These summaries include, in addition to factual statements, the primary conclusions of the research paper in question (in the case of the Reis et al. paper, the direct correlation between disease risk and proximity of people's homes to rat-infested rubbish dumps and open sewers, and indirect correlates with race and poverty). He also described the Open Citations Corpus, an RDF triple store containing all the bibliographic references contained in the Open Access subset of PubMed Central (some 200,000 journal articles containing >6 million references to >3 million unique cited articles), and how this information could be used to determine the most highly cited papers for each of the major tropical infectious diseases. He finally introduced the vision for Open Research Reports, MIIDI-structured summaries of infectious disease papers that could be openly published in both human- and machine-readable form, to provide access to health information that was otherwise locked behind journal subscription barriers, and thus potentially unavailable to people who need it in the developing world, with the idea that ORRs could be created first for these most highly cited papers. The subsequent discussion centered around the effort of creating such rich metadata, even given the availability of the MIIDI tools, who would create the metadata, and the potential incentives and benefits to ORR creators, including citable mini-publications if their Open Research Reports were published with citable DOIs in instant online open access journals (e.g. Open Research Reports in Malaria, Open Research Reports in HIV/AIDS).

Tanya Gray
Tanya explained the MIIDI data model, and demonstrated her two MIIDI-based metadata entry tools in action. MIIDI is written in XML. The Web tool used X Forms and AJAX, while the other tool is a local Java application. Tanya suggested all should use the tools during this Hackathon.

Discussion prompted by these presentations raised the question of whether the required metadata can be automated so it is not a separate process to complete the MIIDI form. The comment was made that people are used to submitting metadata when filling in forms for submitting articles and funding requests and this was picked up as a topic for further discussion.

Mark MacGillvray
Mark presented his work on bibliographic data / records. BibSoup collates the information based on scraping of metadata embedded in a page and creates an automatic record. This data can then be used to show selected information and to search citations by date, by author, by most frequent etc.

The following discussion included the suggestion (from Graham Steel) that Mark should talk to Mendelay Data which is open.

Peter Murray-Rust
Peter then gave a presentation on semantic referencing, or triples. He explained the way that a disease may be referenced by many articles and an article may reference several diseases. Triples use the subject, the predicate and the object of this link to search data. Peter asked the question of where identifiers should come from and what the required referencing standards / nomenclature should be.

Jerven Bolleman
One controversial topic that was picked up for discussion was that of minimal standards. The debate was started by Jerven Bolleman who challenged the notion of minimal standards being the optimal way of recording data and suggested the maximum amount of information should be stored for future use, with all the research and information gathered in order that the whole picture can be re-examined at a later stage when scientific practices are more developed. The counter-argument given for this was that not all data is required and is often duplication of that previously identified, and the example of microwaves study was given wherein the bulk of data is not required once the output is established. One clarification assisted the discussion, in that 'minimal' means the littlest amount of information required to replicate the research and not the 'bare bones' of what seems important at the time of writing up.

Gilles Frydman
Gilles gave a presentation on the production of medical papers and the problem of how to keep records of them all. He demonstrated the exponential rise in the number of papers published over the last few years and explained the problem of cataloguing records and keeping the entries up-to-date, as well as the difficulty for people in keeping abreast of the latest medical developments.