Working Groups/Science/swat4ls hackathon
From Open Knowledge Foundation
Semantic Web Applications and Tools for Life Sciences Hackathon
A two day event hacking content, systems and services for the Life Sciences
University of London Union, Malet Street, Bloomsbury, London, WC1E 7HY. Tuesday 6th December and Wednesday 7th December, 2011.
Overview
This FREE hackathon has been co-organized by DevCSI (JISC Funded), SWAT4LS and Open Knowledge Foundation bringing together delegates from the SWAT4LS workshop and tutorials (taking place on the 8-9 December 2011) and researchers, developers and anyone else interested in the Life Sciences to work together in teams or individually to use and enhance exisiting Open Science semantic web applications and tools and possibly develop new ones.
Target Audience
The event will be suitable for:
- Researchers in the Life Sciences
- Software developers in the Life Sciences
- Software developers / researchers developing Semantic Web Applications / Tools / Mashups / Visualisations from the Life Sciences and other domains
- Anyone interested in developing and using Open Science tools particularly in the Life Sciences
- Those offering support for using data sets, APIs and tools to be used for the delegates
Programme
We hope that delegates and even those not able to make the event will be able to discuss and propose ideas, suggest tools, share datasets, invite collaborations BEFORE the event using the OKFN Open Science Mailing List and OKFN wiki (see below).
The event will start at 6pm on Tuesday 6th December, where we will encourage participants to present lightning talks, network, share ideas, form teams, and think about what they will be working on for the next day. Ideally it would be great if you could start to think of some areas of interest before the event. The next day will start at 9am with teams and individuals outlining their ideas and then they will be given approxiamtely 6-7 hours to work on them. All teams and individuals will be invited to present what they have been working on and the best three ideas/prototypes will be awarded special prizes.
Full version of programme can be found here
Lightning Talks
See notes from all talks here http://wiki.okfn.org/Working_Groups/Science/swat4ls_hackathon/lightning
- David Shotton - Open Research Reports Presentation
- Tanya Grey - MIIDI input form (See Tools at [1])
- Mark Magillivray
- Peter Murray Rust
- Jerven Bolleman - Maximum information standards instead of minimal information standards
- Luke McCarthy - SADI in 5 minutes
- Robert Cox
- Helena Deus
- Trish Whetzel
- Yasunori Yamamoto
- Gilles Frydman
Participants
Once registered, please add your name expertise and what tools or resources you have that may be useful to other hackathon participants to the participant page [[2]]
Theme Ideas
We are working closely with the organisers of SWAT4LS, OKF to ensure that the hackathon can be the best it can be and we have already started thinking of some ideas which you are welcome to start on.
Please add your own ideas to the wiki below and feel free to create a separate page if you wish.
Open Disease Research Reports
Aims:
- For individuals / teams to develop a corpus of accessible, semantically linked and enriched information on the current body of research into this major disease which will be of benefit to both scientists and citizens.
- To generate a high quality set of reports on disease research, available openly to anybody.
- To build upon that base by annotations and mash ups of the underlying information.
Objectives:
- Gathering information from several web sources, including open bibliographic data and open access scientific literature
- Annotation of the collected data
- Generation of interfaces, mashups and visualizations to increase functionality and demonstrate use cases. For example, allowing community annotation of research publications.
- Gather information on the current open body of malaria/CJD/cancer variant research
- Generate open research reports on a subset of papers using the Minimal Information reporting standard for an Infectious Disease Investigation (MIIDI) and input schema at the University of Oxford
See the project page: http://wiki.okfn.org/Working_Groups/Science/swat4ls_hackathon/ORR
Gene-Drug Visualisation
Aims:
- To visualize drug-gene interaction networks using semantic datasets
Objectives:
- To convert raw xml GeneBank data file (http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip) to RDF format
- To convert RDF xml format to JSON (using php json functions)
- To visualise JSON data in CytoScape (http://www.cytoscape.org/) via the web plugin
Results
- We created a script to parse the complete drugbank database (as xml) into an rdf document, which resulted in +21 000 egdes refleting interactions between drugs (identified using drugbank identifiers) and proteins (identified using uniprot identifiers).
- We analysed the types of relationships (agonist, antagonist, etc):
Fig 1. Frequency of drug-target mode of actions
- We then used cytoscape to visualize the data. In our first attempt, we loaded the entire 21K triple network. This is what we got:
Fig 2. Drug-protein interaction network from drugbank. Blue are proteins; red are drugs
Disease Localiser
http://wiki.okfn.org/Working_Groups/Science/Disease_Localiser
Semantic Web Tools/Resources/Discussions
Other Ideas from Post it Notes
- Why do we need data in rdf?
- Use drugbank to create a visualisation of drugs according to the genes they target e.g. drugx-100 genes, drug y - 70 genes, drug z - overlap in some genes
- Make a matrix of genes X diseases. Then fill the values with couts of publications that meta both
- Develop LARKC plugin for concept recognition with 'Peregrine'
- Imagine an Open Research report as a journal article. What tools could we provide for readers to explore the paper and its data in the context of all of the ORR papers, e.g. interactive visualisations. Disdcam all papers with same/similar data etc (e.g. all for some disease / vector)
- Design and genome for a synthetic organelle with all necessary (50-150) genes.
- I want to extract new valuable knowledge from my 'semantic' data'. How?
Responses… SPARQL end points, web interfaces and visualisation tools
- Write a decent Pubmed XML to RDF convertor
- How to use linked data?
To be useful for users (Biologists - clinicians)
- Expose David's citation database as a Semantic Sensitive web service
- Using the XML forms approach, to generate an HTML form from RDF (XML) instead of the XML Schema.
- Could the midi editor be modified to enable the assertions at the end to be entered in a way that is directly transformable to rdf?
- Using the XML forms approach generate a faceted browser from rdf (instead of from the XML schema)
- Infere ontology OO classes from RDF and generate code generator + code to loan the instances into the objects
- A plugin for Mendeley for suggesting friendship for researchers which share common references / bibliographical material
- Make a social web for editing / maintain ontologies in different field / domain of research
- Github like system for rdf resources
- May be in combination of micropublications
- An automatic App which shows, geographically for a give disease, the cases analysed and location (showed on a map) where cases were treated
Patients would access papers related to the searched disease and its synonyms.
- How to make it worthwhile for a researcher to enter metadata when submitting an article?
- To deal with problems / diseases like malaria research requires availability (in the processable format) of data that can help to decide the problem for people / organisations who are capable of reusing? the scarce resources to the right place and right time.
Therefore it would would be useful to identify the sets of parameters and the level of values which in turn will allow to choose areas which will need a first action to be taken. The data should produce (on the global i.e. national or regional level) a set of indicators whether or not the particular case should be evaluated deeper (perhaps with the extended set of parameter). The starting point could be at the level of single patients case or based on the annual reports for whole countries. The answer given by the model could help to narrow the area where for some reason the problem is more likely going to encalate?
Teams and Tasks
- Open Research Reports
- Gene-Drug Interactions Visualisation
- Open Citations . net (OC.net)
- added the OC.net URI scheme to the SADI URI-to-metadata resolver service. This will add SemanticScience Integrated Ontology (SIO) standard metadata about records (PubMed) and identifiers such that their URIs are now semantically typed and can be used for discovery.
- Worked with Open Citations Corpus developers to do a quick fix on their content negotiation to get the RDF we need for SADI
- Used OC.net RDF representing PubMed citation to automatically generate an OWL DL model for that citation data
- Used that OWL model as the output definition for a SADI service that consumes OC.net PubMed URIs and returns their metadata. The advantage of converting it from a GET and/or SPARQL query into a SADI service is:
- it can be added into a Web Service workflow
- it can be "batched" (multiple requests in a single call)
- all metadata provided by OC.net is indexed, and thus each piece of metadata can be used for service discovery
Results
Round ups / Feedback talks
Express Open Citation via SADI - Mark Wilkinson et al
This group looked into putting a SADI wrap around citations to make them more searchable and finding a service that uses URI such as PubMed to return the metadata to create PubMed IDs automatically.
Visualising drug-to-protein interaction - Helena Deus et al, AKA 'The Dodgy Plotters'
These people created an RDP out of proteins. The first visualisation of this displays too much data at once but overall shows that drugs have many genes. The plan now is to make drugs proportionally bigger according to cost and show other types of drugs. The output diagram show labels when clicked.
Working Group around disease localiser - Tomasz Kluza et al
This group created a service to help with localisations and diseases. It was intended to concentrate on malaria but a service for this was found to be already available, which was based on data via news feeds, so attention was turned to cancer. A contact with the project was e-mailed, who makes information re diseases available on the web, and the team volunteered to assist in extend their services when it transpired this service is due to be stopped soon. A sample database of one type of cancer was established thanks to interaction and discussions with other members of the Hackathon. In relation to cancer, the group looked at the geographic spread of cancer in smokers and referred to healthmap.org.
Text Mining and the Large Knowledge Collider - Reinout van Schouwen
Reinault investigated using Peregrine, a text mining technology, and integrating this into TodoBox, the mining client. He looked at natural language within cancer research and how best to use this to compile data.
MIIDI, Cancer Investigations, BibSoup - Jenny Molloy et al
The group asserted that open research reports are a necessity for functionality in research as is access to bibliographic data. Their development was to create these connections. They took output in MIIDI form and put into BibSoup. They also took a list of document titles from cancer forums and put into PubMed and the metadata into BibSoup. They then examined the interface to show all the metadata, displays, time-line, map display etc. They looked at filtering data for patients - listing by side-effects, outcome, availability of the drugs etc - as well as adding a mechanism to upload lay summaries which would be sent by e-mail to the author for approval.
BioSPRKL Tools and Resources for RDF Search - David Gifford et al
This group developed a BioSPRKL system to query web modules. This analyses datasets and links, and automatically generates a user; another part of this resource works locally on the PC with datasets. Suggestions for improving the resulting user experience have been taken on board.
Literature Recommendations - Yasunori Yamamoto
Yasunori worked on TogoDoc, looking at the literature already available. SPRKL queries can pick up differences within literature and he examined integrating TogoDoc with Peregrine. When searching for papers and adding narrative to database link, data mining is very important and this research was to improve this system.
Suggestions for future events
Hold over 2 full days and include accommodation so time constraints are lessened;
Create a forum for lessons learnt;
Follow up on ideas that come to fruition, revisit in 6 months;
Every few hours, arrange a short break to feedback to the lead or note-taker for each Working Group who can keep notes on progress and bounce ideas around;
There is value in getting together without preamble, but ideal is combination of spontaneity and forethought;
As done by Peter Murray-Rust, stick a label on t-shirt stating area of interest, to encourage interaction around topics;
Find a good way to interrupt people mid-creative process, to find out what they're doing when they look busy - Etherpads should be mandatory for this purpose;
Rolling video-streams of Twitter, work ongoing etc, in background;
Have 'floating' people to put notes on WIKI when other people are busy perhaps using a flag system.
There will be feedback forms, please complete, and if you have ideas for sponsors or other events please get in touch with the organising group.
Communication
Mailing Lists
We strongly encourage you to discuss what tools and data sets to be used etc well before the event, by subscribing to the open-science@lists.okfn.org mailing list by visiting http://lists.okfn.org/mailman/listinfo
Wiki
You will all be provided with a login to this wiki, feel free to edit!
Pads
Collaborative online note 'Pads' will be available on the day for real time collaboration, the notes of which will be migrated on to the wiki after the event, see: http://okfnpad.org/swat4ls-hackathon
Hashtag
Tag for event = #devcsi
You will be able to follow announcements about the event via twitter (as well as feeds from blogs and websites etc) by searching for the above tag. If you are new to twitter, please visit, http://www.twitter.com and create an account for yourself. We will be using technologies like this frequently, before, during and after the event.
If you require a twitter client (software to keep up to date with the latest twitters), several can be found at http://www.twitstat.com/twitterclientusers.html