Public Domain Calculators/xml
From Open Knowledge Foundation
Public Domain Calculator XML Format
Foreword
This document describes how a public domain calculator flowchart file should be formatted for the inclusion of it in the Europeana’s public domain calculator. It will discuss the conversion between a flowchart to an xml calculator file and the way such a file can be constructed. It will start by making an analysis of a public domain calculator flowchart as developed by the communia network. Generic European Public Domain Calculator flowchart We will convert the included flowchart into an XML calculator file which can read by Europeana’s calculator We will do so using the following steps: Identifying and numbering questions Identifying question type & creating an XML structure (XSD included below)
Start by opening an empty word file and naming it jurisdiction(Language) using the .xml extension. So a english flowchart of the generic european calculator would be European(English).xml
Identifying and Numbering Questions The first thing we need to do is number all the questions and results that are posed by the flowchart, it is recommended to write this on the document, for further reference. This would transform the above document into the document below:
attachment:Numbered generic copyright.jpg
Identifying Question Types
Using this numbered flowchart we can identify four different types of question: multiple choice questions, single input questions, double input questions, and results. Starting with the last this section will explain how a flowchart can be transformed into an XML calculator format.
An XML structured document is a document that contains information and a description of that information, which is readable for a computer. Our algorithm look for information between ‘<question>’ and ‘</questions>’. This indicates that everything between those two statements are questions. We start by telling the machine that we have a document of questions by writing the following in an empty file:
<questions> </questions>
All the questions needs to be put between these two statements using the structure that is described below.
A resultitalics In the example we have two kinds of results, either an object is in the public domain (question 20) or it is not (question 19). To add these results to the xml structured file the following schema is used:
First we indicate that this is step 19 of the flowchart
<step nr="19">
Than we indicate that this is a result
<type> result </type>
We tell what kind of message should be displayed to the user
<result> This item has not fallen in the public domain. </result>
and we state that all information about this step is given.
</step>
Thus getting this:
<step nr="19"> <type> result </type> <result> This item does not fall in the public domain. </result> </step>
Note: tabs are added for clarification.
This then needs to be placed in the list of questions. (between the “<questions>” “</questions>” statements)
Multiple choice questions
When we have Yes or No questions or other multiple choice questions we will use a different structure, take for example the the first question (#1) in our numbered flowchart: What kind of work is it? Literary, dramatic or artistic Sound recording, broadcast or film Unoriginal Database Each of these directs the user to another question, the first options goes to the question 2, the second options to 3, etc.
We then mark up a list of possible answers like this:
<step nr="1"> <type> multiplechoice </type> <question> What kind of work is it? </question> <answer> <value> Literary, dramatic or artistic work </value> <gotoNr> 2 </gotoNr> </answer> <answer> <value> Sound recording, broadcast or film </value> <gotoNr> 3 </gotoNr> </answer> <answer> <value> Unoriginal Database </value> <gotoNr> 4 </gotoNr> </answer> </step>
We see that the type of this kind of question is called ‘multiplechoice’ Then the question is asked what kind of work it is, then that there are 3 different possible answers, each with their own value and number to go to. After we’ve done translating this question we put them with the other translated questions in the between in the <questions> and </questions> statements.
An Open Question
We have two choices when a question is posed. We have questions that need a comparison between two inputs and a conditional one input question. The latter is discussed first discussed. For example, ‘is the database altered within the last 15 years’, like in question 4. We can choose for an approach of multiple choice, like a yes or no answer, or we can make an open question “In what year was the database last substantially altered?” and have the computer calculate if that is 15 years ago. If we choose the first option we can create an multiple choice type like in the last example. But if we want the computer to calculate we do the following:
<step nr="4"> <type> single </type> <question> In which year was this work last substantially changed? </question> <answer> <value> True </value> <gotoNr> 20 </gotoNr> </answer> <answer> <value> False </value> <gotoNr> 19 </gotoNr> </answer> <evaluate> NOW-Q1>15 </evaluate> </step>
Two results are possible, either it is 15 years ago(True) or it isn’t (False). We see these in as values in the possible answers.
Also we notice that a new information field is added ‘<evaluate>’ , this holds the statement that determines whether the given answer is 15 years ago or not. The statement NOW-Q1>15 might seem difficult, but if we break it up into smaller pieces it becomes quite clear. NOW stands for the current year, thus in 2010 it means 2010, in 2011 it will become 2011. Q1 stands for the answer given to the posed question. > is code for the bigger than character (>), because we already use that character to indicate information for the computer, we cannot use it again within our fields. and any number is simply a number. Thus the statement is: “The current year minus the answer to the question must be greater than 15 to be true, else it will be false”. Next to > for the greater than character we also need to use &ls; for a less than (<) symbol, +,-,* can be used without hesitation.
Double open questions
Sometimes we need to ask more than one than one question and compare their answers. Like question 9, which asks if a work was published within 70 years of creations. Again here we can choose a multiplechoice option, but we can also let the computer compare two different questions. When was it created, and when was it published? We can do that in the following way:
<step nr="9"> <type> double </type> <question> When was this published? </question> <question> When was this work created? </question> <answer> <value> True </value> <gotoNr> 10 </gotoNr> </answer> <answer> <value> False </value> <gotoNr> 70 </gotoNr> </answer> <evaluate> Q2-Q1>70 </evaluate> </step>
Thus instead of posing one question, we ask two. The first asked question can be referred to as Q1 in the evaluation, the second as Q2.
An example file
Below is the combination of the above examples put into one file.
<questions> <step nr="1"> <type> multiplechoice </type> <question> What kind of work is it? </question> <answer> <value> Literary, dramatic or artistic work </value> <gotoNr> 2 </gotoNr> </answer> <answer> <value> Sound recording, broadcast or film </value> <gotoNr> 3 </gotoNr> </answer> <answer> <value> Unoriginal Database </value> <gotoNr> 4 </gotoNr> </answer> </step> <step nr="4"> <type> single </type> <question> In which year was this work last substantially changed? </question> <answer> <value> True </value> <gotoNr> 20 </gotoNr> </answer> <answer> <value> False </value> <gotoNr> 19 </gotoNr> </answer> <evaluate> NOW-Q1>15 </evaluate> </step> <step nr="9"> <type> double </type> <question> When was this published? </question> <question> When was this work created? </question> <answer> <value> True </value> <gotoNr> 10 </gotoNr> </answer> <answer> <value> False </value> <gotoNr> 70 </gotoNr> </answer> <evaluate> Q2-Q1>70 </evaluate> </step> <step nr="19"> <type> result </type> <result> This item does not fall in the public domain. </result> </step> </questions>
After marking up the data in this way you can check its validity using the following XML Schema Document (XSD):
<?xml version="1.0" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- definition of simple elements -->
<xs:element name="text" type="xs:string"/>
<xs:element name="information" type="xs:string"/>
<xs:element name="value" type="xs:string"/>
<xs:element name="gotoNr" type="xs:string"/>
<xs:element name="evaluate" type="xs:string"/>
<xs:element name="type" type="xs:string"/>
<xs:element name="param" type="xs:string"/>
<!-- definition of attributes -->
<xs:attribute name="nr" type="xs:string"/>
<!-- definition of complex elements -->
<xs:element name="question">
<xs:complexType>
<xs:sequence>
<xs:element ref="text"/>
<xs:element ref="information" minOccurs="0"/>
<xs:element ref="param" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="answer">
<xs:complexType>
<xs:sequence>
<xs:element ref="value"/>
<xs:element ref="gotoNr"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="result">
<xs:complexType>
<xs:sequence>
<xs:element ref="text"/>
<xs:element ref="information"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="step">
<xs:complexType>
<xs:sequence>
<xs:element ref="type" minOccurs="0"/>
<xs:element ref="result" minOccurs="0"/>
<xs:element ref="question" minOccurs="0" maxOccurs="2"/>
<xs:element ref="answer" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="evaluate" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="nr" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="questions">
<xs:complexType>
<xs:sequence>
<xs:element ref="step" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Open Knowledge Foundation Public Domain Calculators specifics
The Open Knowledge foundation also uses the developed flowchart to do calculations for publicdomainworks.net. They have large datasets that they will run using these calculators, but as these processes are automated we need to add information to the XML files for them to be able to use these files. For each question and possible answer we need to indicate what information is needed based on the information in their datasets. These sets of bibliographical information usually contains the following information:
work
- name
- type
- date (publication date)
- creation_date
- list of persons
person
- name
- type
- birth_date
- death_date
- country
We also have some agreements of how the data that the open knowledge foundation provides is formatted. For example dates will always be in an ISO 8061 format and types of work and author are also of specific formats:
conf
- MAXLIFE = 125
- HUMAN =”person”
- LEGAL = "organization"
- ANONYMOUS = "unknown"
- LITERARY = "text"
- SOUND = "recording"
- PHOTO = "photograph"
- VIDEO = "video"
- DATABASE = "database"
We need to add what information a question needs for it to be solvable. So for example when we have a question like
<question> <text> What kind of work is it? </text> </question>
then we want to add that it is the type of work that will provide the answer. We do this by adding something like:
<question> <text> What kind of work is it? </text> <APIparam> work.type </APIparam> </question>
Then if it is an multiple choice question we also want the possible answers to be encoded in a similar fashion. Thus changing:
<answer> <value> Literary, dramatic or artistic work </value> <gotoNr> 2 </gotoNr> </answer>
to:
<answer> <value> Literary, dramatic or artistic work </value> <APIparam> conf.LITERARY </APIparam> <gotoNr> 2 </gotoNr> </answer>
the conf.LITERARY holds the information that can be compared to the work.type data that is referred to in the previous example.