Wg/linguistics/minutes/20130320

What this is about: http://wiki.okfn.org/Wg/linguistics Where the other minutes have gone: http://wiki.okfn.org/Special:Search/Wg/linguistics/minutes/

Telco March 20th, 02:00pm CET

participants (please add yourself & skype account)
 * Christian Chiarcos
 * Judith Eckle-Kohler
 * Andrea Schalley
 * John McCrae
 * Sebastian Hellmann
 * Pablo Mendes
 * Jonathan Pool
 * Sebastian Nordhoff
 * Richard Littauer (chat)

agenda (please add)
 * project reports
 * Internet Archive/PanLex multilingual legacy dictionary digitization project
 * PanLex (http://panlex.org) is a research and development project sponsored by the Long Now Foundation (http://www.longnow.org), San Francisco
 * Goal: create and publish a database documenting all known translations among all lexemes in all language varieties of the world
 * Status: about 500 million translations among about 19 million lexemes in about 9000 language varieties
 * Linked RDF version produced by Claus Stadler and Patrick Westphal at U Leipzig and reported in http://www.semantic-web-journal.net/system/files/swj422.pdf
 * Partnership (starting in 2013) with the Internet Archive, SF: making printed dictionaries and lexical resources for less documented languages available worldwide
 * Currently selecting resources, aiming for about a thousand items
 * A physical copy of each resource will remain at Internet Archive
 * Internet Archive will digitize (scan and OCR) each one and add the digital version to its Open Lending Library
 * Any person will be able to borrow a digital version, consult it, and return it, one borrower at a time
 * The PanLex project will also analyze the digital versions to obtain lexical translation data for inclusion in the PanLex database
 * developments since September 2012
 * MLODE overview (http://sabre2012.infai.org/mlode )
 * MLOD postproceedings (16 submissions, see here http://www.semantic-web-journal.net/underreview )
 * one review short:
 * http://www.semantic-web-journal.net/content/multiculturalism-and-semantic-interoperability-electronic-health-records-w3c-standard-based#
 * http://www.semantic-web-journal.net/content/lego-unified-concepticon
 * LLOD cloud status ( http://nlp2rdf.lod2.eu/OWLG/llod/llod.svg )
 * ACTION: Sebastian to ask Richard again for the code
 * webservice?
 * ACTION: otherwise find a solution, we need a process for metadata cleaning and maintenance
 * ACTION: metadata categories => mailing list
 * upcoming events
 * Linked Data in Linguistic Typology (http://www.eva.mpg.de/lingua/conference/2013_ALT10/files/theme_sessions.html#session4)
 * 106 Good, Jeff Fine-grained typological investigation of grammatical constructions using Linked Data
 * Overview: This talk discusses the use of Linked Data to create a database of grammatical constructions, emphasizing the potential advantages of Linked Data in such a context
 * 131Nordhoff, Sebastian Crowdsourcing WALS
 * WALS is a hallmark of typology, but the matrix is sparse. Crowdsourcing can help fill the matrix while taking care of provenance and security issues
 * 138 Shmatova, Mariya Typological NNC-Database: Storage of Cross-Linguistic Data
 * For each language, the database will hold an amount of highly structured data. A numeral-noun construction usually consists of two or three elements: noun, numeral and (in some languages) classifier, and up to 20 features (depending on the language) are used to describe relations between them. Furthermore, each language can have several types of NNCs, differing in word order, syntactic relations, meaning, etc. To this we must add glossed examples and metadata. Multiplying by the number of languages, we get a highly complex data structure compared to a relatively moderate amount of data. Moreover, one might want to rethink this structure as new typological data come into consideration.
 * 151 Beermann, Dorothee & Pavel Mihaylov Linguistic Annotations and Knowledge Representation
 * In our presentation we will discuss annotation and ontology integration, building on work by Chiarcos (2008). We will describe our own annotation model which consists of relations between morphemes, strings of tags (rather than individual ones) and tag classes, to suggest a design beyond the simple 1-1 mapping from tag to grammatical concept. We are particularly interested in the annotation of multi-lingual data from less-documented languages. We furthermore would like to reflect the incremental character of the linguistic annotation process (Mosel 2006a) bypromoting a more dynamic integration of ontological knowledge.
 * 167 Moran, Steven Typology with graphs and matrices
 * In this talk we demonstrate how to leverage Semantic Web technologies to transform data in any number of typological databases, e.g. WALS (Haspelmath et al, 2008), Autotyp (Bickel & Nichols, 2002), PHOIBLE (Moran, 2012), ODIN (Lewis, 2006) or individual databases -- along with metadata from Ethnologue (Lewis, 2009), LLMAP (ILIT), Multitree (ILIT, 2009) and Glottolog (Nordhoff et al, 2012) -- into Linked Data. This is the vision of the Linguistic Linked Open Data Cloud (LLOD; Chiarcos et al, 2012).
 * 2nd LDL-2013 Proposal by Thierry Declerck, Philipp Cimiano, Christian Chiarcos, John McCrae, co-located with the generative lexicon conference http://glcon2013.org/
 * Judith: overview of best practices regarding modeling linguistic data in RDF/OWL (literature, tutorials ...) ?
 * start a page in the Wiki
 * Recent output from a workshop in Rome: http:// goo.gl /Th2VA Workshop] was:
 * just in the process of developing some kind of agreement
 * organizing panel discussions on this topic on upcoming events
 * ACTION for Judith: create Wiki page and then send an email around to announce it on the mailing list
 * ACTION (nice-to-have): describe the problem and make a blog post for pemanent reference
 * publications
 * OWLG overview in the Journal of Language Resources and Evaluation (was discussed in September) ?
 * apparently, no progress since then (SN)
 * ACTION: needs to be reassigned
 * ACTION: Delayed to August
 * administrative
 * micro-funding for 100-500 Euros by OKFN
 * ACTION: Sebastian to write an email about cloud generation software
 * web site updates
 * establishing/distinguishing workshop series ?
 * organizational issues
 * PROPOSAL: Make a list of tasks
 * a small number of people from the different communities obliged to keep up with everything, participating on at least every second telco
 * scheduling telcos
 * before each telco go through all the actions and check whether they are done
 * procedure for letters of support
 * schedule blog posts and see, that they are written in a reasonable time frame
 * ACTION: write organizational proposals, then merge it
 * 2 weeks ~ April 12th: John McCrae, Sebastian Hellmann, Christian Chiarcos contribute