Wg/linguistics/minutes/20120309

Linked Data in Linguistics Workshop 2012 (originally from http://linguistics.okfnpad.org/ldl2012) Please add your comments ideas to this document DISCUSSION Friday 1330-1400 join the mailinglist on http://lists.okfn.org/mailman/listinfo/open-linguistics workshop homepage http://ldl2012.lod2.eu wiki http://wiki.okfn.org/Working_Groups/Linguistics http://okfnpad.org/OWLG Post-Proceeding results: Wikipage with a list of tools and places to publish data (bitbucket, github, sourceforge) download links for tools and data List of linkable resources, which are just html pages What resources can be wrapped? Link collection for resources Steve will maintain the page for best practices Linked Data in Linguistics Best practice for layered privacy? Star system for licenses (see Marieke below) Best practices for different formats: RDF, LMF, LAF Test the technical process of registering http://www.ukp.tu-darmstadt.de/scientific-community/edited-book-the-peoples-web-meets-nlp best practices from other communities --> discuss on mailing list Issues arising are there disadvantages of RDF ? what should guide the decision process in the design of new systems ? PRO RDF as a database graph model -> more flexible than relational db models PRO RDF as a format easy to parse, especially turtle syntax + human readable it seems than lemon can be reused for dictionaries, but LMF can not be reused? it's a question... this is not correct, LMF is a metamodel that can be differently serialized. in principle, an RDF linearization of LMF is still LMF. PRO SQL as a database PRO standoff XML as a format PRO inline XML as a format speed and query languages for tree-structured data too much technical detail ? CON non-trivial to model provenance CON non-trivial to model probabilities some papers dealt with similar architectures, what can we learn by comparing them ? "data journal": something to be pursued in the context of an open data initiative like the OWLG technical aspects social aspects (acceptance, prestige) Most of the presentations did not contain a download link for data/tools? What is the reusability success of this workshop? Will it be possible for anybody to reuse the submitted work? Is it possible to utilize and build upon the ontologies by Pareja-Lora? What about the DTA tool of Blume et. al.? Is there a downloadable dump of ISOcat? all data category PIDs also support content negotation and have a RDF representation selections of data categories can be accessed as RDF (same as individual data categories) or DCIF (Data Category Interchange Format) see http://www.isocat.org/rest/help.html and the *dump*? ;) so the dump is get a selection as a data categories as RDF (subset of the specifications) or DCIF (full specifications) if you want everything use the REST API to get a list of all selections or profiles there is some planned work on extending the RDF representation Publish links between dictionay entries? The entries in a dictionary might have a closed license, but what if you just publsih the links between them. What about CKAN? discoverability: (1) how to know / find out what is out there (2) how do I make people know that I have published a new linked open data set? (perhaps some easy to read instructions / use cases / examples) what to do about the ISO 639-3 language name identifers as URIs http://lexvo.org/id/iso639-3/fra http://glottolog.livingsources.org/resource/languoid/id/38464.xhtml http://glottolog.livingsources.org/resource/languoid/id/fren1271.xhtml (currently 500 Server Error, but will soon be fixed) distinguishing non-information resource URIs from information resource URIs is important: http://www.ethnologue.com/show_language.asp?code=eng is a URI for a web page, not for a language. infrastructure: SPARQL endpoint (perhaps this concept "endpoint" could be explained in a few words ;-)) infrastructure: any advices on a good triple/quad store and reasoner? What are the steps to publish a new linked open data set? Example from Cultural Heritage: The conversion of the Amsterdam Museum collection database to LOD: http://semanticweb.cs.vu.nl/lod/am/data.html Perhaps nice to try to rewrite this or create a tutorial out of this? What happens with SQL data or EXCEL data, for example? ;-) For yet another different format (transformation), coming from the Librarian field, see http://mayor2.dia.fi.upm.es/oeg/index.php/en/downloads/228-marimba Open: what does it mean? Can you be open and at the same time that restrict access to some information (subject anonimity issues). Different levels of access. Ideas for openess and support? Protecting PI rights. How to meet the stated goals increase coverage What would be helpful to bring more data into the LLOD increasse density increase communication between subcommunities How are the resources and links accessed? Can an organization like OLAC (Open Language Archives Community) be empowered to assist in providing the necessary infrastructure for disseminating knowledge regarding what resources exist and perhaps in facilitating collaboration? How could such an organization reach an international level? Next steps another LDL ? (LSA, SLE, ESWC (that would be 2013, ISWC 2012 might still be an option), OKCON, ACL, EACL, ECAI, NAACL-HLT, ESSLLI, other ...LREC2014) alternate between linguistic venues and Semantic Web venues Marieke and Antonio volunteer to explore possibilities for another workshop joint publications ? joint grant proposals ? A European Network proposal? That could be one of the first steps... assure persistence of resources cooperations LATC will maintin linking, based on SPARQL endpoints CLARIN might be asked for hosting which metadata is necessary? perhaps some people would be interested in OKFN hackfests ? Ideas: There is a Semantic Web student course in Leipzig: http://bis.informatik.uni-leipzig.de/de/Lehre/1112/ss/LV/SWP#h1520-5 students will converrt open data sets to RDF, please add some below Ideas about data sets to triplify and link to Linguistic LOD http://www.glottopedia.de Ethnologue - wrapper ? probably covered by other data sets OLAC http://www.crassh.cam.ac.uk/events/1685/ ODIN (IGT data) http://odin.linguistlist.org/ some TDS datasets (need to check licenses) Other interesting events Linked Data Cup http://i-semantics.tugraz.at/i-challenge/call-for-submissions LOD lessons from other communities During the sessions, the topic of licencing came up several times. Lately, cultural heritage institutions have been getting more and more interested in publishing their data in the cloud and they have come up with a classification system for licences. Maybe we can learn from them? **** Public Domain: the data falls in the public domain, or the rights holder has waived all rights. The user can use the metadata for any purpose without restrictions. *** Attribution License (BY) when the licensor considers linkbacks to meet the attribution requirement. The user can use the metadata for any purpose, provided he retains the attribution link. ** Attribution License (BY) with another form of attribution: The user can use the metadata for any purpose, provided he gives attribution in the way specified by the provider. * Attribution Share-Alike License (BY-SA): the user can use the metadata for any purpose, provided he gives attribution in the way specified by the provider. Unlike the other ±star² options, the metadata can only be combined with data that allows re-distributions under the terms of this license. (http://lod-lam.net/summit/2011/06/06/proposed-a-4-star-classification-scheme-for-linked-open-cultural-metadata/) Ontology Alignment Evaluation Initiative What I couldn't come up with during my presentation: http://oaei.ontologymatching.org/ Slides: The slides of Marieke's presentation are online at: http://www.slideshare.net/MvanErp/ldl2012