Wg/linguistics/llod

=Linguistic Linked Open Data diagram (draft)=

This wiki page provides information about the Linguistic Linked Open Data cloud diagram, its official home page on the OWLG web site. The image below shows linguistic resources (lexical-semantic resources, corpora, metadata repositories and linguistic data bases) that have been published in Linked Data format, or that will be published as such by members of the Open Linguistics Working Group as well as other individuals and organisations.

http://linguistics.okfn.org/files/2013/10/llod-colored-current-1024x955.png

Click here for an interactive SVG

Background
The Linguistic Linked Open Data cloud is a collaborative effort pursued by several members of the OWLG, with the general goal to develop a Linked Open Data (sub-)cloud of linguistic resources. The diagram is inspired by the Linking Open Data cloud diagram by Richard Cyganiak and Anja Jentzsch, and the resources included are chosen according to the same criteria of openness, availability and interlinking. Although not all resources are already available, we actively work towards this goal, and subsequent versions of this diagram will be restricted to openly available resources. Until that point, please refer to the diagram explicitly as a "draft".

Availability
The diagram is available under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license, you are free to share (copy, distribute and transmit) the work, to develop your own extensions (adapt, remix) of the work, and to make commercial use of the work under the condition that you give attribution as specified on the LLOD page of the OWLG.

Links for the current SVG, PDF and PNG versions as well as to earlier versions are provided on the LLOD web site.

How to contribute
The current version of the diagram was developed by Christian  Chiarcos, Sebastian Hellmann and Sebastian Nordhoff, however,  it is  officially attributed to the Open Linguistics Working Group as a whole and any working group member can modify it.

In order to add a new data set to the diagram draft, please follow this procedure:


 * create a web page, a page in this wiki or a CKAN entry that contains a description of the resource. A data sample would be desirable.
 * announce the resource on the mailing list.
 * ask the [mailto:open-linguistics-owner@lists.okfn.org mailing list administrators] to add your data set to the SVG diagram
 * by asking to be included in the LLOD diagram draft, the data provider promises to take care that the conditions mentioned above will be met at some (undefined) point in the future:


 * The data is resolvable through HTTP,
 * it is provided as RDF,
 * it contains links to another dataset in the diagram, and
 * the entire dataset is available under an open license.


 * Depending on the number of resources that are available under these conditions, we will shift from draft to official status  within the next two years. Data sets that don't meet these criteria will  then be removed.

OPTIONAL: Instead of asking the list administrators, you can also edit the SVG diagram by yourself: * Checkout (clone) the NLP2RDF repository * Go to the directory nlp2rdf.lod2.eu/OWLG/llod in your cloned repository * Read the instructions in nlp2rdf.lod2.eu/OWLG/llod/readme.txt about directory organization. If you consider your contribution a major revision, please replace the files in nlp2rdf.lod2.eu/OWLG/llod (make sure to preserve them in a separate directory named - where YEAR and MONTH correspond    to the date of the last modification of the original files from nlp2rdf.lod2.eu/OWLG/llod. * Modify nlp2rdf.lod2.eu/OWLG/llod/dev/llod.svg, e.g., using inkscape. Make sure that    the nodes you create and their text fields contain hyperlinks to the web site documenting your efforts. * Submit your changes * The directory nlp2rdf.lod2.eu/OWLG/llod from the repository will be checked out to    http://nlp2rdf.lod2.eu/OWLG/llod. The diagrams in the wiki and on the OWLG wordpress site are located in    that directory. Note that one day may pass until the version on the web server is updated.

Joint activities
The development of the diagram itself is designed as a joint activity. However, we all do pursue different application scenarios, so that concrete collaborations will normally not involve all contributors to the LLOD cloud, but rather focus on selected resources from the LLOD cloud and how their information can be combined. You can help the community by using the diagram above or extensions of it, on your websites and in your publications, and by mentioning the OWLG. If you need help to devise a description of the OWLG as a whole, you can ask the [mailto:open-linguistics-owner@lists.okfn.org mailing list administrators] to provide you with text snippets. From time to time, we plan to present summaries of the progress of the OWLG and the LLOD cloud at selected conferences, in journals, etc. We would like this to be written on a really collaborative basis, and any contributor should feel invited to act as a co-author. Normally, we will announce plans for such summary publications by email to the people who committed to provide their data as part of the LLOD cloud.

So far, we have written a few papers that describe the vision behind the LLOD, that motivate the LLOD, discuss representative types of linguistic resources involved for selected examples, and that describe its development until October 2011 (i.e., before the community efforts really began). These are published as a separate part in the companion volume of the LDL workshop, and a concise summarization over these, extended with a focus on resources relevant to the Francophonie, has been submitted to a special issue of the French NLP journal TAL. With more and more people getting involved, and more resources becoming actually available, these publications should be quickly superseded by updated papers written by a larger number of people.

Data sets
In the diagram, every node should be linked to a web page describing its status, providing data (samples), etc. (With an SVG-capable browser, you can just click on the nodes.) For resources that are already publicly available, this should be their CKAN page.

For a number of resources, however, we created temporary pages in the wiki describing their situation. This can be due to the following reasons:
 * 1) There is no other webpage available, e.g., because the project data has not yet been released.
 * 2) Multiple resources have been grouped together into one node, because they are derived from the same data set (e.g., we have multiple versions of WordNet and Wiktionary).

The data sets documented only in this wiki are:
 * FrameNet (not yet publicly available)
 * WordNet (different WordNet instantiations available)
 * Wiktionary (different Wiktionary instantiations in preparation)
 * Glottolog/Langdoc: to be released in February 2012
 * ASJP to be released in March 2012
 * WALS in principle available as RDF, but not on CKAN. Have to get a dump and put it there
 * APiCS to be released in late 2012
 * QHL not to be released until copyright problems are solved

In case you create another wiki page to document a resource, please add it to this list.

Resource types and diagram coloring
We are currently in the process of identifying categories of resources in the LLOD cloud for providing a colored version of the diagram, see the corresponding discussion site.

=Earlier versions=

In this section, important developments of the diagram are summarized. For a full overview over the current and earlier versions of the diagram, please see the LLOD web site.

January 2012
After the telco on Dec 14th, 2011, Christian Chiarcos, Sebastian Hellmann and Sebastian Nordhoff developed a new draft for the LLOD cloud using PPTX, that was discussed and extended after the telco on Jan 25th, 2012. Since February 2012, all diagrams were directly designed in SVG, so that this version is deprecated. As of this version, the diagram is available as CC-BY.

http://nlp2rdf.lod2.eu/OWLG/llod/2012/01/llod.png

January 2011
The idea to create a Linguistics Linked Open Data cloud developed in late 2010. An early draft for how such a Linked Data (sub)cloud of linguistic resources could look like was drafted by Sebastian Hellmann and Christian Chiarcos in January 2011:

http://nlp2rdf.lod2.eu/OWLG/llod/2011/01/llod.png

(Note that this diagram still includes resources that have not been released under open licenses, e.g., Penn Treebank-based corpora. These are to be replaced by open resources with similar characteristics, for the Penn Treebank, for example, by the [www.anc.org/OANC Open American National Corpus].)

Independently, several participants of our OKCon-2011 workshop (Berlin, Jun 30th, 2011) had developed similar ideas, and on Oct 24th 2011 the attendants of the OWLG meeting at the ISWC 2011, Bonn, agreed to provide examples of linguistic resources as a first concrete step in the creation of a Linguistic Linked Open Data cloud. See here for the protocol.

We announced a call for data samples on the mailing list, results were presented and discussed on Dec 14th, 2011, in a skype teleconference.