Working Groups/Science/1

= First Meeting for Working Group on Open Data in Science =

Details

 * When: 2nd June 2009, 1600 GMT
 * Where: #okfn IRC channel on oftc.net (you can connect via Mibbit)

Agenda

 * Panton Principles for Open Data in Science
 * Draft text of principles
 * Open Data in Science prize
 * Brief summaries of benefits of openness in different domains
 * Updates from WG members about relevant projects/activities they know about or are involved in
 * Is It Open Data? Service
 * CKAN for open data in science

Summary
1. Draft "Panton Principles" were discussed.


 * Draft text is now up at Panton Principles
 * Seemed to be general consensus that these should focus on exhortation (strongly recommend not "must")
 * Suggestion to create appendices relating to specific issues (e.g. formats, specific discussion of non-commercial)

2. Prize was discussed


 * What form should prize take and should it be general or focused on particular disciplines
 * Is money needed? (Is prestige enough)
 * How do we make it prestigious
 * Who could help fund/support it

3. Discussion of recent developments (see irc)


 * Also general ask to the community (e.g. open-science list) for news items on open data in science

Participants

 * Cameron Neylon
 * John Wilbanks
 * Jonathan Gray
 * Rufus Pollock
 * Jenny Molloy
 * Puneet Kishor
 * Egon Willighagen
 * Tom Keays
 * Amrapali Zaveri
 * Michele Mattioni
 * Shreyasee Pradhan
 * + several lurkers!

Transcript
#!irc 17:01 so peter murray rust and jim were going to try to join by dongle as they are travelling 17:01 -!- ssp [ca9c0ae1@widget.mibbit.com] has joined #okfn 17:01 ah - ok was going to ask about pmr 17:01 likely somewhere on the M25 PMR informed us! 17:02 I will need to walk out the door in 40 minutes 17:02 so let's start 17:02 also jean claude sent his apologies 17:02 shall we just go round for who is present? 17:02 perhaps everyone could state their full name for the transcript? 17:02 john wilbanks 17:02 rufus pollock 17:02 jenny molloy 17:02 Puneet Kishor 17:02 Cameron Neylon 17:02 egon willighagen 17:02 Jonathan Gray 17:02 Michele Mattioni 17:03 Tom Keays 17:03 -!- kaythaney [801e10e7@widget.mibbit.com] has joined #okfn 17:03 -!- mib_jvn4v8 [a860c815@widget.mibbit.com] has joined #okfn 17:03 tim: are you there? 17:03 also i think michael nielsen wasn't able to make it today 17:04 alright so we've got an agenda at http://wiki.okfn.org/wg/science/1 17:04 tim hubbard is also here (he dropped in early but seems to be currently online but not present) 17:04 yes 17:05 Amrapali Zaveri 17:05 i note that i have uploaded current draft text of principles to that page in preparation for this meeting 17:05 Shreyasee Pradhan 17:05 -!- mib_jdva71 [500244d7@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 17:06 if anyone has to go early and has anything in particular they'd like to discuss - perhaps they could raise it now otherwise i suggest we go through agenda starting with panton principles... 17:06 So the text of the principles has been bouncing backwards and forwards for a week or so now and I think there is now reasonable agreement on how it looks between the people who have been directly involved 17:06 sorry - jumped the gun there... 17:06 no i thinki it is good 17:07 in fact cameron i suggest you chair this meeting at least until you leave 17:07 -!- AndrewLang [46b9f42e@widget.mibbit.com] has joined #okfn 17:07 i get the feeling you will keep us moving :) 17:07 ok - can do :-) 17:08 i suggest people add any additional items once we've gone through basic agenda rather than before 17:08 I'm here - Tim Hubbard 17:08 so the text as it currently exists has been bounced backwards and forwards betwen myself, Rufus, John, Peter MR and Jordan but hasn't necessarily seen a wider audience so the comments of others here would be useful at this point if there are any major issues that people see 17:09 I like the Principles. Adding a link that clearly explains the difference between PDDL and CC0 (and, alternatively, why or why not ODbL) would be greatly helpful. 17:09 The idea is that it strongly recommends an approach. Peter was somewhat concerned that it looked like legalese so impressions on the "feel" would be useful as well 17:09 the only issue i've not raised (didn't want to slow consensus building) was on technical ones 17:09 I think the "Panton Principles" look good. One suggestion I had might be to reference the Open Data Commons as a source of licensing conventions. http://www.opendatacommons.org/ 17:10 i.e., making it legally open is step one 17:10 but if you don't annotate it, you're not doing the right thing either 17:10 can I comment on point 3? 17:10 so step two needs to be a protocol that makes that happen (in reply to John) 17:10 absolutely, and step 0 is actually making available! 17:10 egonw absolutely 17:10 it seems if we're making principles, the idea that 1. make it open, 2. make it comprehensible (seem to go together) 17:11 the argument given for point 3 about data preservation is a good one 17:11 but the first sentence is vague about what is being limited 17:11 one idea would be to appendices or addenda giving guidance on particular areas 17:11 don't want to promote GPL licenses here, but just want to make the parallel... 17:11 GPL licenses specifically allow commercial use, but limits it to such use where the product is GPL too... 17:12 such would not contradict with the argument given 17:12 and actually promote preservation even more 17:12 i think the concern here was the issue of non-commercial restrictions, which are quite frequent 17:12 therefore, I would like to request work out what limitations are strongly discouraged 17:13 commercial entities are generally loathe to touch anything with GPL 17:13 rgrp: yes, apossible clarification could be to discourage license with all commercial activity 17:13 the intention is to strongly discourage non-commercial terms, I agree the sentence is a bit vague (I wrote it I think) 17:14 the RFC drafting process is a pretty good guide - it's what we used in the SC protocol - has standard definitions of should, must not, etc. 17:14 wilbanks: yes, that's something I was thinking too 17:14 -!- mib_jvn4v8 [a860c815@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 17:14 http://www.ietf.org/rfc/rfc2119.txt 17:14 in that way, you could rephrase point 3 also the other way around 17:15 the question is whether we are proposing a standard or recommendations (though in both cases we can use the MUST, MUST NOT etc) 17:15 "license should be used that allow commercial re-use for example for preservation purposes" 17:15 or SHOULD / SHOULD NOT 17:15 which seems to be more in line with the current (dis)couraged 17:16 so is the idea that e.g. STRONGLY -> SHOULD NOT 17:17 rgrp: yes, that sounds good... more in line with wording in most other specifications 17:17 I would be happy with that - although if we use SHOULD/SHOULD NOT we run the risk of getting into the bind of SA again. It would be SHOULD NOT use non-commercial terms and SHOULD be dedicated to the public domain then. Are people happy with that? 17:18 alternatively i note that RFC has recommended 17:18 I have one suggestion 17:18 which is slightly more like the encourage/discouraged ... 17:18 general question... since these are 'principles'... why not focus on the ideal? and use MUST / MUST NOT 17:18 The "should" language is consistent with the Open Database License 17:18 http://www.co-ment.net/text/844/ 17:18 is there a reason to encourage but allow other less favorable licenses to be conform these new principls? 17:19 I guess it is a question of tone: we are not in a position to "tell" others what to do though we can exhort them ... 17:19 more that we are in no position to "allow" or disallow - we would hope people adopt and therefore have used softer language 17:19 GFDL is dicourages, but NMRShiftDB could still say conform these principles 17:19 egonw: my preference is to take a strong normative position, but i am in the minority on that onw 17:19 one 17:20 I rather see clear and definate principles 17:20 so that 17:20 it is clear that when 17:20 i.e., i think the ODbL has the potential to enclose public domain data in genomics, for example, if it's allowed under the principles 17:20 something is PP labeled 17:20 but i'm ok with the compromise that it's simply not encouraged for now 17:20 it really *is* Open in the way we would like it 17:21 so that I know what I get when I see the Panton Principles label (icon, bar, ..) 17:21 but maybe this is confusion on the exact purpose of the guide lines 17:21 we already have that in the form of the open data buttons http://opendefinition.org/buttons/ etc 17:21 true 17:22 i think the aim of the PP was more a statement of what we recommend (and what we hope funding agencies might adopt) 17:22 ok, so what is it that one wants to acchieve with this writing then? 17:22 that does not mean you could not use MUST 17:23 we recommend you use those principles which would require you to do ... 17:23 to perhaps make it clear i think people would have a PP button to show their support for the principles and would have a suitable open data (or CCZero, or PDDL) button to show they were actually "doing it" 17:23 for me, the key compromise is that we believe Public Domain is the best practice for data from public research. So my original draft was more softly worded, as a recommendation precisely because we won't get full adoption across the board and the object is to not alienate people 17:24 rgrp: that brings me to another question... NMRShiftDB is GNU FDL because there was nothing better... 17:24 i agree with Cameron here 17:24 agree cameronneylon. People don't mind being shown a better way, but hate being told they are wrong 17:24 +1 cameron 17:24 agreed 17:24 cameronneylon: yes, SHOULD works better for keeping people around 17:24 The purpose for me is to have something we can take to funders as an agreed best practice position. They will use the words "must" and "shall" we only recommend. Does that make sense? 17:25 egonw: right, that is a common problem ... 17:25 I'm happy with SHOULD 17:25 we can always provide an "implementing code" set of paragraphs for funders, or an appendix they can attach to funding letters, that includes must etc 17:25 cameron: yes, i think so. I think we want to keep with encourage/discourage as it is more "polite" 17:25 rgrp: I am a bit worried about the confusion when GNU FDL data repositories also affiliate with the Panton Principles... 17:25 egonw: +1 17:26 Do I take it that broadly people are reasonably happy with the "principles" behind the principles and perhaps a little more working over the wording is required and can happen offline in e.g. Google doc? 17:26 to follow up wilbanks here i think it might make sense to have additional implementing code (or addenda) dealing with e.g. machine tags etc 17:26 cameronneylon: yes 17:26 cameronneylon: absolutely 17:26 PP++ 17:26 egonw: right, they can affiliate but to actually support it they need to change their license ... 17:27 rgrp: yes that would be my view 17:27 egonw: it might be nice to develop a set of use cases of things that could be difficult, and then flesh out stories of how they would work in practice 17:27 what i mean by that is, of course they can say "We support these principles" but they'll then need to act on that ... 17:27 wilbanks: yes, and I liked the suggestion you made of an independent website that would enable people to sign up and could have guidance and support information 17:28 usual question of who has the time to put it together of course :-) 17:28 might i propose that we move some of this discussion to the public open-science list, perhaps with a wiki page or google docs place for people to add stuff 17:28 (I also mention this interests of time ...) 17:29 I think that would be a good approach - ok to refine offline and build content 17:29 Are we happy to move on to item 2? 17:29 cameronneylon: SC can put up some in-kind contributions to the site but it really needs to be independent, so suggestions are welcome 17:29 one idea I have mooted would be to have discipline subsectinos to http://opendefinition.org/ 17:29 cameronneylon: yes, agenda 2 17:29 e.g. http://opendefinition.org/science/ 17:30 rgrp: +1 17:30 yes :) agenda 2 17:31 Open Data in Science prize 17:31 Ok then: Prize for Open Data in science - I think everyone thinks this is a great idea, telling success stories etc. Questions are where to get some money for it from and how to present it 17:31 how much do people think we need? 17:31 i.e. is it a lectureship, linked to a meeting etc? 17:31 I don't think the prize needs to be very much in and of itself. But building something high profile around it would be helpful, but costs money. 17:32 does it need any money at all? 17:32 yes, the prize itself could simply be a new computer, or a travel stipend 17:32 I agree with comeron 17:32 prestige by association is worth way more than any amount of money (ok, not *any* amount of money, but y'know) 17:32 recognition by peers 17:33 so a panel of distinguished judges? 17:33 the question is - could we get some high profile funding types to sign on as judges or somesuch 17:33 yep - for me the best would be if the prize was travel to, and giving a keynote at a relevant big international meeting with a high profile 17:33 so can be just mechanism for nomination, judging panel, announcement 17:33 jwyg: beat me to it 17:33 or famous scientists?

1. "Panton Principles" were discussed. * Draft text is now up at http:// * Seemed to be general consensus that these should focus on exhortation (strongly recommend not "must") * Suggestion to create appendices relating to specific issues (e.g. formats, specific discussion of non-commercial)

2. Prize was discussed

* What form should prize take and should it be general or focused on particular disciplines * Is money needed? (Is prestige enough) * How do we make it prestigious * Who could help fund/support it

3. Discussion of recent developments (see irc)

* Also general ask to the community (e.g. open-science list) for news items on open data in science

17:33 :-) 17:33 < AndrewLang> winners from previous years 17:33 < AndrewLang> eventually 17:33 John Sulston? George Church? 17:34 who is this prize for? for individuals (yes, travel stipend might mean something) or for agencies (who would yawn at travel stipend) 17:34 definitely for individuals (or projects) in my view 17:34 individuals I think - to help raise the profile of people making things happen 17:34 yes, individuals 17:34 i thought for individuals too 17:34 so, how to recognize agencies that are doing great work in opening up their data? 17:34 I'm not sure scientists would travel to talk at a meeting about open data. perhaps money award would be better 17:35 would even encourage thinking of this as a programmatic way for a discipline to reward a person - a curriculum of sorts - so that there could be scalable promulgation of the idea across disciplines 17:35 -!- ajz [ca9c0ae1@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 17:35 I don't think the money for the award will people encourage to do Open Data 17:35 let the open communities in each discipline find $2000 for a computer, their own judges, give a prize etc 17:35 I think the fame and meeting Open Data peers is more important 17:35 tim: i thought idea would be that they go to some general (high-profile) conference 17:36 tim: thats why I think we need money for a travel award - and a big enough profile meeting that it is worth their while to go to - but then we need to create the meeting... 17:36 (this is from the SC experience - the anthropology people have totally different ideas about data than physics, and so on) 17:36 rgrp yes, this is where science prizes are normally made - at existing meetings 17:36 (and an award to a biologist won't change things outside biology) 17:36 egonw, I agree. Peer recognition is a waay more powerful incentive/motivator than money is 17:37 egonw: i'm not sure purpose, at least initially, is incentives (per se) but rather to recognize achievement (over time incentives will kick in ...) 17:37 wilbanks: like the idea of discipline by discipline award but could this fragment it? 17:37 -!- GrahamS [5c08817e@widget.mibbit.com] has joined #okfn 17:37 so what we would need is senior people within specific disciplines who would champion the creation of a lecture slot at a major disciplinary meeting? 17:37 wilbanks: i agree in the long run but, like Jonathan, wonder whether in the short-run we'd focus on "science" even though it includes a lot of different stuff 17:37 what conference? 17:37 there was talk at Wellcome Trust about supporting a regular openness meeting a while ago...however harder times... 17:37 idea is, we do one - where we have expertise - but we document the process, and feed it to the champions of open across disciplines 17:37 encourage them to DIY 17:38 you would not just need to travel the winner, but also the judges (or at least one PP representative) 17:38 and we come up with a way to certify outcomes 17:38 for cross discipline, needs to be a group of funders. 17:38 -!- AndrewLang [46b9f42e@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 17:38 (honestly people, this is the first time I use IRC and phone is a lot better!) 17:39 that's the problem with different communities of practice tim :-) 17:39 tim: if you can guess an international conference call bridge we'd be delighted ... 17:39 (tim: you get used to it... it's like sitting in a bar and just listening to the one next to you) 17:39 there are plenty of free international call systems...there's also skype 17:39 listening to more than one conversations is very difficult, but reading bits of typing is easier 17:39 second time at IRC will be much better, and so on 17:40 but before we go off at a tangent 17:40 I've got to run now - but there seem like lots of good ideas on the prize, question of figuring out which can be most quickly implemented 17:40 thanks all 17:40 my question is: what could we do now (or at least within next 6m to a year) 17:40 cameronneylon: thanks cameron and we'll post transcript on the wiki 17:40 cameronneylon: take care! 17:40 provide publicity 17:40 maybe blog about prominent contribution to Open Data 17:41 the prize does not have to be perfect to start with 17:41 I could ask at wellcome trust - the idea was to link to the conference centre here - linked a bit to their support of open publishing. However, would need other funders for other decipines 17:41 tim: sounds like a good idea 17:41 wellcome trust is not just life sciences, right? 17:41 in the uk, ukPMC is supported by multiple funders after all 17:41 wellcome trust is just 'human and annimal health' 17:42 but the prize is for any science 17:42 but ukPMC is funded by BBSRC, MRC, WT and others 17:42 not? 17:42 wilbanks: can you think of anyone we could ask in the US? 17:42 SC has strong connectivity to a lot of open groups across sciences at least, once we have something I can reach out 17:42 so it has to be a broader collection of funders. I'll ask 17:43 We probably already know enough people to form a reasonably prestigious panel of judges 17:43 off the top of my head, it's anthro/archeo, geospatial, biology, chemistry, physics, astrophysics, climate 17:43 these are the places we've observed commons-based efforts gaining traction at least 17:43 -!- cameronneylon [82f6841a@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 17:43 in public funded science 17:44 < GrahamS> maybe ask David Lipman @ NIH/NCBI for ideas, I could ask him 17:44 what do people think of having first one cross-discipline and aiming to do discipline specific in future? 17:44 social science is harder - over 50% in the US is funded just by salary 17:44 GrahamS: that would be great 17:45 To summarize: 17:45 < GrahamS> Thanks - Will do 17:45 Lipman and head of WT talk regularly on PMC stuff - also to HMMI. However all restricted to biology 17:45 cross discipline is hard though - it's much easier to post GBs of microarray data compared to anthro data - there are repositories, data annotation standards, etc. 17:45 the equivalent effort in anthro to do a tiny open data project might be heroic 17:45 I think that if you grouped a few funding agencies, others could be brought on board, but would be better to start more than just biology 17:45 have to figure out how to normalize for effort 17:46 right, but this doesn't have to be perfect ... 17:46 at the start we just want to generate attention and appreciation for efforts made 17:46 (prizes could even be split!) 17:46 My question is: in the interests of actually doing something reasonably fast 17:46 just think the rules have to be clear enough - we're going to apply a scale that recognizes effort, not just total deposit amounts 17:47 nods 17:47 Would it be worth going for: distinguished judges + a small cash prize + big announcement by as many of us in the community as possible 17:47 quality rather than quantity 17:47 recognises effort, organisation of data, data standards etc. - there are quite a lot of criteria - almost like oscars 17:47 with funders involvement, lecture slot etc as phase 2 plans 17:48 tim: right but we can always just start with best picture and work up to the rest as time goes by! 17:48 (for one thing IRC is hopeless for people who can't spell :-)) 17:49 we can probably do the simple option ourselves within the next 6m or a year while the other options could take significantly longer 17:49 i think it would be good to set a rough time frame and do the best we can 17:49 would the prize recognize only activities related to "depositing data" or also any activities that further the cause of open data? for example, creating a software that helps in some aspect of openness, or evangelizing/activism, or leadership in the government? 17:50 punkish: +1 17:50 punkish: again i think we could afford to be pretty flexible at the start and then narrow down as time goes by 17:50 I see those as quite orthogonal: software and open data 17:50 I will email people at WT tonight - follow up on the open data conference idea of a few years ago... 17:50 if we really feel overwhelmed by nominations we can always split things up 17:51 open data means so many things, i prefer flexibility as per rgrp 17:51 punkish: they don't really go hand-in-hand 17:51 i mean, we should give lipman a lifetime achievement award for open data 17:51 in any case we can do a lot of the basic tasks - suggest judges, collate lists and communities to contact, etc 17:51 entrez is enormously important to the utility of open genomic data etc 17:51 just putting up files of numbers doesn't equal open data 17:51 policy and software matter a lot :-) 17:52 nods 17:52 the key is to make this "open data prize" the epitome of *respect*, something that most scientists/academics understand and strive for... 17:52 personally i think we don't need to worry, at the start, too much about all the messay details. We want to start doing it and we'll discover the issues as we go along ... 17:52 entrez is a front end to INSDC - which is the 3 way international collaboration - NCBI/EBI/DDBJ - could be said that its the international flavour that has kept things open for so long 17:52 right, I think there is enough prize support for software 17:52 (there have been counter pressures) 17:52 association is really the greatest motivator in academia and science, because no matter how much money one makes, some idiot will always make more 17:52 "Running prizes" like "Running code" 17:53 * egonw needs to catch his bus 17:53 nice meeting you, looking forward to the next session 17:53 egonw: thanks for coming 17:53 egonw: yup you too! take care! :-) 17:53 ok, i suggest we close out prizes and move on swiftly 17:53 egonw: bye! 17:54 again we can summarize prize ideas and put this on the open-science list 17:54 nods 17:54 next item is Brief summaries of benefits of openness in different domains 17:55 so i guess a series of case studies? 17:55 i think this was an area where we really want people to either reuse existing materials or get volunteers who commit to writing up something fairly brief 17:56 nods 17:56 any suggestions as to existing materials or people willing to volunteer themselves (or others!) 17:56 perhaps we could start a list of displines on the wiki and assign/suggest names 17:56 this idea was originally proposed by Cameron, I think, back at the workshop we held in London last November 17:56 jwyg, is this for general benefits (elimination of hunger and poverty, peace and goodwill) or specific, documented benefits (from your comment about "case-studies" it seems to be the latter) 17:57 in geospatial we have been documenting case studies in a few different initiatives, so linking to those could be done 17:57 i would say the focus is on latter, but not necessarily limited to 17:58 that sounds great! 17:58 what do people think about this? 17:59 wilbanks: I don't know whether you already have any existing materials we could point/link to here 17:59 fairly easy to write some short notes on human genome project - plenty published that can be pointed to 18:00 tim: that would be great 18:00 we don't have a lot of stuff written down - though we do have some good meeting archives from the one we did in 2006 18:00 (i think tim actually keynoted that one) 18:00 :) 18:00 ok in the interests of keeping things tight on time I propose we end this item here 18:01 but with ACTION: (jwyg) follow up offers of info post meeting 18:01 sure 18:01 < GrahamS> I'm sure Glyn Moody http://opendotdotdot.blogspot.com/ would be willing to write something up if asked. 18:01 -!- kaythaney [801e10e7@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 18:02 I need to leave in 5mins 18:02 ok 18:02 so final item is Updates from WG members about relevant projects/activities they know about or are involved in 18:02 here, as noted on the wiki page, I wanted to bring up registering/maintaining science entries on ckan 18:03 current list is here: 18:03 http://www.ckan.net/tag/read/science 18:03 -!- egonw [~quassel@farmjw093.farmbio.uu.se] has quit [Ping timeout: 480 seconds] 18:03 here's one. there was an meeting in Toronto a couple of weeks ago on pre-publication data release. reported in science. follow up to ft lauderdale meeting in 2003 18:03 based around genomics, but idea to encourage pre-publication data release in other areas 18:04 more later, as document gets finalized 18:04 will you ping across a url/blog link when ready? 18:04 I'll post the science link - I think its open 18:05 i've got to run - thanks everyone - jtw 18:05 re. ckan if anyone knows of any *open* scientific datasets please either add them directly or send them over 18:05 -!- wilbanks [801e10e7@widget.mibbit.com] has quit [Quit: http://www.mibbit.com ajax IRC Client] 18:05 wilbanks: thanks 18:05 wilbanks: thanks for coming! 18:05 I was in New Delhi last week convincing the nascent Indian Institute of Human Settlements (http://www.iihs.co.in/) to adopt and promote a completely open curriculum based on open software, technology, data and publications. Very receptive. Will continue to work on this. 18:05 punkish: that sounds fantastic 18:07 yup, but it will really be fantastic when it is realized. Stay tuned. :-) 18:07 also it would be great if anyone had any specific suggestions about what could be better in ckan 18:07 does anyone else have any updates they'd like to share? 18:08 -!- kaythaney [801f22e9@widget.mibbit.com] has joined #okfn 18:08 ok - got to go - bye 18:08 i better shoot off soon 18:08 tim: thanks for coming 18:08 tim: take care! 18:09 i think we'll close here as we've taken our 1h slot 18:09 nods 18:09 bye everyone