Wg/linguistics/minutes 2010 Jan to May

Below, you can find the content of the etherpad (http://okfnpad.org/linguistics) from May 27th

Currently we are planning OKCon: http://okcon.okfnpad.org/linguistics


 * 1) Before May, 27th##################################

= Working Group on Open Data in Linguistics = This EtherPad is used to collect ideas and data in an informal way. After the ideas have riped a little we migrate them here: http://wiki.okfn.org/wg/linguistics

Next meeting
January 18th, 2011 in Berlin Please tell us whether and when you are coming: http://www.doodle.com/mymk7pn62w7ypkqy

Preparation
* Read minutes of last meeting: http://wiki.okfn.org/wg/linguistics/minutes_2010_12_01

Resource Collection
http://www.oaod2010.de/index.php?id=home The Open American National Corpus http://www.anc.org The Manually Annotated Sub-Corpus http://www.anc.org/MASC

TODO: - which sources qualify as (legally/technically/..) open? -- opendefinition.org principles / website? advocacy

Possible Projects
* Guide on best practices for making linguistic data open. * Maintaining a registry of collections of open corpora, dictionaries and other linguistic resources on CKAN: * http://ckan.net/tag/linguistics * New group at: http://ckan.net/group

Project Members
* ? - no formal obligations, willingness to join calls and participate in discussion - representativeness of members (visibility)

Potential List Members
Note: this is an extremely imcomplete list of people who may or may not be interested in contributing to ODL-WG!

* Alexis Palmer, University of Saarbrücken (to be contacted) * Emily Bender & William Lewis (U Washington) and folks behind the ODIN database/GOLD ontology http://www.csufresno.edu/odin/ (to be contacted) * Martin Haspelmath, Max Planck Institute for Evolutionary Anthropology (to be contacted) * Jeff Good, University of Buffalo (to be contacted) * Laura Welcher, Director, The Rosetta Project, The Long Now Foundation * Steven Bird, University of Melbourne (to be contacted) * Peter Wittenburg, MPI Nijmegen (to be contacted) * Helen Aristar-Dry (Eastern Michigan U) & folks behind http://linguistlist.org/ and http://emeld.org/school/case/index.html * Kim Gerdes (Nouvelle Sourbonne, Paris) * Sabine Schulte im Walde (U Stuttgart) and folks at DGfS/CL (https://dgfs.de/cgi-bin/dgfs.pl/coli?lang=de) * Folks at: * http://www.clarin.eu/external/ * http://cyberling.org/ * http://lsadc.org/ * http://www.impact-project.eu/ * http://www.tc37sc4.org/ * GSCL (http://www.gscl.org)

Tasks
* Invite other prospective members: * Ask around for other people to invite * Arrange first IRC meeting! * Discuss WG purpose and projects * Clarify relation to similar activities (e.g., SIGANN, LanguageCommon)

Other projects/inititives

 * http://www.cs.vassar.edu/sigann
 * http://languagecommons.org

Possible activities
* Best practices for publishing linguistic data? (Licensing, rights clearance, annotation standards ...) * Review of existing resources (top 5 resources, ...) * License harmonization: http://www.anc.org/OANC/license.txt (todo: http://www.opendefinition.org/licenses/ )

aims

 * website
 * not technical stuff, interfaces, formats (?)
 * social/legal standards for exchanging data
 * best practice guidelines for sharing publications
 * recommend to publish datasets with articles
 * no infrastructure, no standard promotions
 * open to individuals, not (necessarily) institutions
 * interface to other working groups
 * best practices
 * 10 principles
 * map out potential issues
 * but don't scare people off by elaborating the legal issues too much

resources
parallel corpora
 * http://urd.let.rug.nl/tiedeman/OPUS/
 * http://www.umiacs.umd.edu/~resnik/parallel/bible.html (licence ?)

? multi-medial corpora

annotated corpora
 * which annotations ?
 * not tool-specific

unannotated corpora
 * available under an open licence (in contrast to plain web corpora)

functional stuff ?
 * corpus compilation scripts to construct corpora from legally problematic web sources

papers & grammars
 * including meta data

? metadata about resources
 * cf. clarin & flarenet

? metadata about ling. phenomena

? databases
 * http://starling.rinet.ru/main.html (licence ?)

? reference of people reference of people ?

? tool index
 * already there: http://annotation.exmaralda.org/index.php/Linguistic_Annotation

? visualizations
 * http://llmap.org/language-search.html, http://www.llmap.org/languages/uuu.html

blog (better not, cf. cyberling)