Wg/linguistics/Telcos/2011 December

Schedule

 * start: 11:00 CET, duration approx. 30-60 minutes
 * intro OWLG (~ 2min)
 * brief introduction of participants and their resources (~20 minutes)
 * discussion, especially possible links (see issues below, max. 20 minutes)
 * next steps, milestones, next meeting

Attending

 * + Sebastian Nordhoff (typologist, bibliograohical data base on lesser-known languages, meta data in RDF, MPI-EVA)
 * + (John McCraepostdoc bielefeld, monnet, multilingual dictionaries as linked data)
 * + Richard Littauer (grad student Saarbrücken & CLARIN, typology, xml metadata, workflows)
 * + Judith Eckle-Kohler (postdoc, ukplab, darmstadt, integration LSRs, Uby: large-scale integrated resource, german, english, first quarter 2012, not RDF, but LMF)
 * + Michael Matuschek (Uby with Judith, alignment of resources, entity matching between resources)
 * + Christian Chiarcos (U Potsdam --> U Southern California, interoperablility of corpora, all RDF)
 * + Jonas Brekle (WP-->RDF, U Leipzig)(skype-id: jonas.brekle)
 * + Steve Moran (post doc, QuantHistLing project, LMU Munich); 11h45


 * -Johanna Voelker
 * -Philipp Cimiano

Introduction Open Knowledge Foundation

 * several working groups
 * WG linguistics founded late 2010 (http://wiki.okfn.org/Wg/linguistics )
 * 1/20101draft linked open data cloud linguistic resources
 * 6/2011 workshop at OKCON
 * 10/2011 ISWC RL meeting call for resources

Resources
(a) lexical-semantic resources


 * DBpedia-based wiktionary, Jonas Brekle, in progress


 * jwktl (JavabasedWiKTionary Library, Christian Meyer: earlier RDF dump: http://downloads.dbpedia.org/wiktionary/de.wiktionary0.1.nt.bz2)


 * lemon source, http://monnetproject.deri.ie/lemonsource including wiktionary and wordnet, online soon in the new year


 * Uby, Michael Matuschek & Judith Eckle-Kohler, in progress
 * integration and linking of large-scale lexical-semantic resources
 * will be published in 1st quarter of 2012, format: LMF (XML)
 * no focus on RDF, maybe someone else woudld be interested in doing this
 * current coverage: FrameNet, WordNet, VerbNet, Wikipedia, Wiktionary, OmegaWiki, GermaNet, can be made available, exception is GermaNet
 * English and German so far
 * monolingual and crosslingual sense alignments between several resource pairs, e.g. WordNet - Wikipedia, WordNet - Wiktionary, German OmegaWiki - WordNet, VerbNet - WordNet
 * Michael: cross-lingual alignment
 * licensing: same licenses as original resources (i.e. GermaNet license has to be obtained from Tübingen)


 * Steve Moran: "Quantitative Historical Language Comparison" (QHL [QLC ?]) project: digitizing dictionaries, partially copyrighted data, partially open, translation graphs (wordlists, e.g. Spanish 2 native American); side project: parallel texts
 * Steve Moran: phonological segment inventories with geographic location data, population data, etc.
 * committed to make PHOIBLE available after publication of the dissertation

(b) annotated corpora
 * http://purl.org/pforskare owla POWLA (OWL/DL scheme for linguistic corpora)
 * http://purl.org/powla/negra-sample.owl (fragment of the German NEGRA corpus in POWLA)
 * DADA (GrAF/MASC in RDF, converter developed by Steven Cassidy, provided by Nancy Ide and Keith Suderman)
 * SAGA corpus (speech and gesture alignment corpus, text files -> RDF) (John McCrae, Bielefeld)

(c) knowledge bases of general linguistic knowledge
 * http://glottolog.livingsourcforskare es.org (taxonomy of language and dialect ids, login:glottolog pw:glottolog)
 * languoids: http://glottolog.livingsources.org/resource/languoid/id/362647.xhtml
 * references: http://glottolog.livingsources.org/resource/reference/id/288546.xhtml
 * http://cldbstest.eva.mpg.de/asjp (typological distance measurements between languages)
 * http://purl.org/olia (annotation schemes, terminology for linguistic annotation)
 * http://linguistics-ontology.org/ - GOLD (terminology for linguistic annotation)

Issues

 * How to publish

formats

 * XML, LMF, OWL, RDFs [legacy Uby formats]
 * Uby is represented in Uby-LMF, an instantiation of the meta standard LMF, implemented in XML (DTD specification)
 * http://www.lexicalmarkupframework.org/: RDF specs for LMF
 * LMF2RDF using lemon (John McCrae, currently offline)
 * What sort of XML schema, then? OLAC/CMDI/IMDI/DC...
 * Few resources (time, manpower) at TU Darmstadt

links

 * Uby<->Lemon??
 * information on possible linking of Uby-Wiktionary and other Wiktionary dumps will be posted to the mailing list
 * timeline

corpora + LSR

 * someone has to do the linking
 * future work in Darmstadt, as possible application in text processing
 * something on this in the context oif DBpedia

glottolog linking

 * new categories for computational resources => index of LOD resources
 * harvest information from existing repositories on NLP resources and add it to glottolog
 * glottolog can be linked to PHOIBLE (Steve Moran)

Strategies

 * get LLOD-diagram out top priority

Actions

 * action: responsible person: timeframe
 * draw diagram, Christian Chiarcos, next meeting (Graph creation tools: https://cacoo.com/, http://prezi.com/)
 * repository of publications, Sebastian Nordhoff, next meeting
 * Judith, Michael, Sebastian, Steve: document their license problems on wiki page
 * Create wiki page, Christian Chiarcos, next meeting

Next meeting

 * End of january (23-27), doodle

Commitments
CC:2 corpora in POWLA, OLiA ontologies

SN: Glottolog, ASJP, early next year

JMcC: WordNet/Wiktionary lemon (Jan), SAGA (later next year)

Uby: Licensing problems (<3/2012), wp, wkt, omegawiki

Jonas Brekle: dbpedia-wiktionary, first dump (en, de) before Christmas 2011

wikipage to document licensing issues (copyright, anonymization issues): Judith, Sebastian N., Steve Moran