Wg/linguistics/minutes/20111024

2011, Oct 24th: real-life meeting @ ISWC 2011

Tagesordnung
meetings should be once a month virtual meeting every two months [CC: I guess the first is an artifact and this is correct ?]

homework
next virtual meeting around Dec 15th, 2011 until then: ask WG members to provide RDF data sets as a nucleus for the LLOD cloud everyone should provide two data sets as examples for linguistic resources (see below) what constitutes a linguistic resource ? - suggestion: everything that can be reasonably applied to naturally occurring text (i.e., unseen text), e.g., thesaurus, dictionary, even a bilingual word list (reasonable := someone uses it for the analysis/modification/manipulation/generation of text) - but not every kind of data produced by linguists, e.g., eye movement data from psycholinguistic experiments (relevant for the experiment and probably insightful for the text that belonged to the experiment, but not applicable to other texts)

Introduction
okfn ckan publication of LLOD cloud ckan-&gt; deri [what does that mean ?] http://www.deri.ie/

LOD cloud exploratory mechanisms relation to LREC [resource map] and OLAC low threshold for submission of data SPARQL? mobilization will only work if it is automated motivated create a showcase for LLOD data to show benefit to others most corpora are not open, e.g. Tiger (Cimiano) but people work on that: e,g, OpenANC, but right, this is a serious problem at least *some* more recent corpora are, however, designed from the beginning as open resources, examples are the French Synpathy corpus (spoken French+phonology/prosody+syntax) or the Icelandic corpus (don't recall the exact name) Everybody should select one or two datasets of their choice and makes them available We will then try to establish links between those resources. Published data can be enriched baseline is browsing and cloud visualization

Goals

 * 1) Legal
 * 2) Technical
 * 3) Overview
 * 4) Agitation

Attending
John Sebastian H Sebastian N Dimitra Peter Paul Philip Aldo Johanna Daniel Sheerar Christian [Skype]

Analysis
* OKCon * DGFS workshop * 70 members
 * wg still in initial stage
 * generating publicity via workshops
 * mailing list growing
 * many researchers willing to contribute to cloud
 * have to maintain momentum

Brainstorming
* similar to CKAN * quite substantial task * similar to UiMA/GATE * might not be the most central topic of the wg to create identification * create a new resource and ask for integration as tightly as possible into a maximum of other resources * Christian Chiarcos can provide a corpus. this would probably be the MASC corpus: www.anc.org/ MASC
 * Joint effort RDFisation
 * Workflow repository
 * Challenge
 * see "homework" above

Was im Ergebnis aber unklar ist, wie diese Challenge gewonnen werden kann. (Vielleicht kann man so eine gemeinsame Aufgabe, die es in vorgegebener Zeit umzusetzen gilt, organisatorisch auch anders und weniger kompetitiv aufziehen.)

Frequency of meetings

 * location
 * interval