Working Groups/Linguistics/How to contribute

Requirements
If you wish to contribute to the linguistic linked data cloud and have your resource included in the cloud you must satisfy the following requirements. These requirement are derived from http://richard.cyganiak.de/2007/10/lod/#how-to-join

1. Resolvable
There must be resolvable http:// (or https://) URIs.

You must be able to provide a URL giving at least one example e.g., http://www.myproject.com/myresource/example.rdf

2. RDF
They must resolve, with or without content negotiation, to RDF data in one of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples)

The example URL must return a valid RDF document in RDF/XML to the following query

curl -L -H "Accept: application/rdf+xml" http://www.myproject.com/myresource/example.rdf For more details see http://richard.cyganiak.de/blog/2007/02/debugging-semantic-web-sites-with-curl/

3. 1000 Triples
The dataset must contain at least 1000 triples.

If your dataset is greater than 1MB in size it will likely fit this criterion.

If your resource does not meet this requirement then it may be included as a schematic resource

4. Links
''The dataset must be connected via RDF links to a dataset that is already in the diagram. This means, either your dataset must use URIs from the other dataset, or vice versam. We arbitrarily require at least 50 links.''

These links can be either made to either another similar resource (such as Wordnet) or to a data category resource (such as ISOcat).

5. Crawlable
Access of the entire dataset must be possible via RDF crawling, via an RDF dump, or via a SPARQL endpoint.

This may be provided by either ensuring


 * 1) There is an "index" page with links to every part of the resource
 * 2) The result is downloadable as a single file or a zip of files
 * 3) All the data is loaded into a RDF store and is queriable using SPARQL (see http://www.w3.org/wiki/SparqlImplementations for a list of SPARQL servers available)

6. Linguistic
Your data must be a language resource

We define language resources as follows:

Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases. (from http://www.springer.com/education+%26+language/linguistics/journal/10579)

If you are unsure if your resource qualifies feel free to ask on the mailing list

7. Registered
The data must be registered with CKAN

See.

Finally, send a mail to the mailing list.

Assistance
If your data does not meet the above requirements please contact the mailing list.

If you wish for the community to verify the quality of your resource and provide comments, we recommend you register it at the MLODE site


 * 1) Go to http://code.google.com/p/mlode/
 * 2) Choose "Issues" > "New Issues"
 * 3) Select "Add Dataset" under "Template"
 * 4) Enter a summary and description
 * 5) Click "Submit issue"

You may verify the CKAN entry at http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/validate.php

In addition, the following guides may be helpful:


 * Converting language resources to RDF
 * Language Resources to link to and guidelines
 * Community hosting for language resources

Registering with CKAN

 * 1) First go to the http://thedatahub.org/ and create an account
 * 2) Once logged in choose "Add a dataset"
 * 3) Enter a title, a license and a description
 * 4) Add the following resources, and give them a title:
 * 5) An example URL
 * 6) The (URL of the) index page, the dump or a SPARQL endpoint (at least one, preferably all three)
 * 7) Click "Add dataset"

Metadata with CKAN
The LLOD cloud is drawn based on metadata entered into datahub.io. To ensure your resource is included correctly in the cloud please include the following

Under "additional information"


 * 1)   tag showing the number of triples in the resource
 * 2) For each resource you link to a   giving the number of links where xxx is the ID of the target resource in DataHub

Finally, please add one of the following as a tag



Schematic Resources
Schematic resources play an important part of the linguistic linked data cloud, these include but are not limited to:


 * Resources describing the structure of other language resources using RDF Schemas, the OWL ontology language or similar constraints
 * Collections of data categories that may be used by other resources