Open Data Licensing

= Guide to Open Data Licensing =

This guide has a new permanent home: http://www.opendefinition.org/guide/data/

'''This version is no longer guaranteed up-to-date. Please update your bookmarks'''

Introduction
This a guide to licensing data aimed particularly at those who want to make their data open.

The first section deals with the practical question of how to license your data. The second section discusses what kinds of rights (intellectual property or other) exist in data in various jurisdictions.

Status and Editing
This guide is in an 'beta' state with much that can be done to improve and extend it. It is in a wiki precisely so that anyone may edit. So, please, whether you want to add a new section or just fix a typo, feel free to make changes. To edit just click on the link at the side or bottom of the page. If you want to discuss the content or make general comments add them to the talk page.

Disclaimer
In addition to the disclaimer in the license linked at the bottom of the page please note that:


 * 1) This information is collected by altruistic individuals most of whom are not lawyers; those who are lawyers are not your lawyers nor experts in your situation. You use this information at your own risk.
 * 2) Nothing in this page should be considered as legal advice.

Licensing your Data
In many jurisdictions there are explicit rights in data and even where not the situation is uncertain (see discussion below for more on this). Thus, if you are planning to make your data available you should put a license on it -- and if you want your data to be open this is even more important.

What licenses can you use? We recommend for 'open' data you use one of the licenses conformant with the Open Definition and marked as suitable for data. This list (along with instructions for usage) can be found at:

http://www.opendefinition.org/licenses/

What Legal (IP) Rights Are There in Data (and Databases)
When talking about databases we first need to be distinguish between the structure and the content of a database (when we use the term 'data' we shall mean the content of the database itself). As explained in detail in the FAQ prepared by Science Commons [11]:

"Databases usually are comprised of at least four elements: (1) a set of field names identifying the data; (2) a structure (or model), which includes the organization of fields and relations among them; (3) data sheets; and (4) data. All of the Creative Commons licenses can be applied to these elements to the extent that copyright applies to them (and the Dutch and Belgium licenses can also be applied to the data, for reasons discussed in greater detail below. Copyright applies to minimally creative works expressed in a fixed form. In most databases, items (2) and (3) - the structure and the data sheet - will reflect sufficient creativity for copyright to apply. A Creative Commons license applied to these elements will permit copying of these elements under the conditions of the license selected. Field names, such as “Address” for the name of the field for street address information, are less likely to be protected by copyright because they often do not reflect creativity."

Thus, the structural elements of a database will generally be covered by copyright. However, here we are particularly interested in the data. When we talk of "data" we need to be a bit careful because the word isn't particularly precise: "data" can mean a few or even a single items (for example a single bibliographic record, a lat/long etc) or "data" can mean a large collection (e.g. all the material in the database). To avoid confusion we shall reserve the term "contents" to mean the individual items, and data to denote the collection.

Unlike for material such as text, music or film the legal situation for data varies widely across countries but most jurisdictions do grant some rights in the data (as a collection).

This distinction between the "contents" of a database and the collection is especially crucial for factual databases since no jurisdiction grants a monopoly right in the individual facts (the "contents") even though it may grant right(s) in them as a collection. To illustrate, consider the simple example of a database which lists the melting point of various substances. While the database as a whole might be protected by law so that one is not allow to access, reuse or redistribute it without permission this would never prevent you from stating the fact that substance Y melts at temperature Z.

Forms of protection fall broadly into two cases:


 * Copyright for compilations
 * A sui generis right for collections of data

As we have already emphasized there are no general rules and the situation varies by jurisdiction. Thus, below we proceed country by country detailing which (if any) of these approaches is used in a particular jurisdiction.

Finally, we should point out that absent any legal protection many providers of (closed) databases are able to use simple contract combined with legal provisions prohibiting violation of access-control mechanisms to achieve similar results to a formal IP right. For example, if X is provider of a citation database, it can achieve any set of terms of conditions it wants simply by:

(a) Requiring users to login with a password (b) Only providing a user with an account and password on the condition that the user agrees to the terms and conditions

Database Directive
In the European Union there is a database specific 'Database Directive'. It provides for both copyright and the sui-generis right though with some restrictions on when you can use the copyright (old common-law jurisdictions and many others allowed copyright in simple data no matter how 'unoriginal'). Specifically here is the quote from [3] paras 19-37 and following:

(i) Copyright in the Compilation. ... First, it [the DB directive] defines what is meant by a "database": "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means." [DB Dir Art 3] Then it allows copyright in a database (as distinct from its contents), but only on the basis of authorship involving involving personal intellectual creativity. This is a new limitation, so far as common law countries are concerned, and one which must presage a raising of the standard or originality throughout British Copyright law. Intellectual judgment which is in some sense the author's own must go either into choosing contents or into the method of arrangement. The selective dictionary will doubtless be a clearer case than the classificatory telephone directory but each may have some hope; the merely comprehensive will be precluded -- that is the silliness of the whole construct.

...

(ii) Database right. In addition there is a separate sui generis right given to the maker of a database (the investing initiator) against extraction or reutilisation of the database. Four essential points may be highlighted:

(1) The right applies to databases whether or not their arrangement justifies copyright and whatever position may be regarding copyright in individual items in its contents.

...

Cf. Directive 96/9/EC ('on the legal protection of databases') at Eurlex or at the old EC 'Information Society' archive.

Pre-Database Directive
Database protection has been around for quite a while indirectly both in Europe and elsewhere. In Europe many countries traditionally granted copyright based protection:


 * 1) Common law countries such as UK always had a 'sweat-of-the-brow' approach.
 * 2) Nordic countries have long had a 'catalogue' right (since 1950s)
 * 3) Germany used unfair competition and copyright
 * 4) Netherlands had exception with Van Dale vs. Romme even though it had an very old law that granted copyright in non-original stuff.

However generally continental Europe tougher because requires higher standard of 'creativity/originality' to grant copyright.

Australia
Like other common law jurisdictions Australia provides for 'sweat-of-the-brow' copyright on the basis of the application of skill and labour. The relevant decision in this regard is ''Desktop Marketing Systems Pty Ltd v Telstra Corporation Limited''[4] summarized in [5].

Copyright in Data under the Australian Copyright Act 1968 (Cth)
Under the Australian Copyright Act 1968 (Cth)[6] ideas and  information may be protected when they are expressed in a "material form",  such  as written down, entered into a computer or stored in some  other  machine-readable form. What is protected is not the  idea  or  information  in itself, but rather the expression of that idea or information, that  is, the form in which the idea or  information  is  expressed. This means that raw data,  basic  facts  or  items  of  information  will  not,  in themselves,  attract  copyright  protection. However,  where   data, information or facts have been compiled to  create  a  new  work,  eg  a dataset or database, that work  may  be  protected  by  copyright  as  a compilation if it  meets  the  originality  threshold  under  Australian copyright law.

Compilations are protected in the literary  works  category,  which  is defined in s10(1) of the Copyright Act  as  including  "a  table,  or  a compilation, expressed in words, figures or  symbols". Data, metadata or a compilation of numerous items of data or metadata  records  may  be protected  by  copyright  if  the  compilation  meets  the   originality threshold required for copyright. Any underlying database software may also be protected by copyright as a literary work. The definition  of "literary work" in s 10(1) of the Copyright  Act  includes  "a  computer program or a compilation of computer programs".

Compilations - Desktop v. Telstra
In Desktop  Marketing  Systems  Pty  Ltd  v  Telstra  Corporation   Ltd (2002)[7], the leading Australian case in this area,  the  Full  Federal Court considered whether Telstra had copyright in its Yellow  Pages  and White Pages directories containing names, addresses  and  phone  numbers of telephone subscribers in  a  given  region,  listed  in  alphabetical order. Telstra had  undertaken  substantial   labour   and   incurred substantial expense in compiling and listing the subscriber  entries  in its White and Yellow Pages directories.

The court held that copyright can be claimed in a compilation which -


 * has been produced as a result of the exercise of skill, judgment or knowledge in the selection, presentation or arrangement of the materials; or
 * has required the investment of a substantial amount of labour or expense to generate or collect the material included in  it  (the so-called "sweat of the brow" approach).[8]

The court  held  that  Telstra  had  met  the  originality   threshold, notwithstanding that there may have been minimal intellectual  input  or creativity in the selection and  arrangement  of  the  material  in  the telephone directories.

The decision in Desktop Marketing Systems Pty Ltd v Telstra Corporation Ltd makes it clear that, in Australia,  a  compilation  of  data,  eg  a database,  may  be  protected  by  copyright  provided  that  either   a sufficient amount of labour or expense  has  gone  into  collecting  the data or a sufficient degree of skill, judgment  or  knowledge  has  been applied in selecting and organising the data.

The test described by the  Full  Federal  Court  in  Desktop  Marketing Systems Pty Ltd v Telstra Corporation Ltd sets  a  lower  threshold  for originality than that required in the United States. In Australia, the originality test does  not  require  any  degree  of  creativity  to  be applied in creating the work, with the  result  that  a  purely  factual compilation is more  likely  to  qualify  for  copyright  protection  in Australia than in the United States.

Assigning and licensing copyright in datasets and databases - further considerations
Owners of copyright in datasets and databases may  fully  or  partially assign their copyright or license  another  party  to  use  it. To be legally effective, assignments must be (1) in writing,  and  (2)  signed by or on behalf of the  assignor. Licences need  not  be  in  writing unless the licence being  granted  is  an  exclusive  one. Copyright datasets and databases can be licensed  under  a  creative  commons  or other similar open content copyright licence.

For further information  on  Australian  copyright  law  applicable  to databases, the licensing of data and issues of open access to data see:


 * The Oak Law Project Report No. 1: Creating a Legal Framework  for Copyright Management of Open Access within the Australian Academic and Research Sector (PDF), August  2006,  Elect  Printing,  Canberra;
 * The Oak Law Project and the Legal framework for e-Research Project Report: Building the Infrastructure for Data Access and Reuse  in Collaborative Research: An Analysis of  the  Legal  Context (PDF), June 2007, Elect Printing, Canberra;
 * The Queensland Spatial Information Council's Report: Government Information and Open Content Licensing: An Access and Use Strategy, October 2006;
 * Chapter 4 - Copyright, in "Intellectual Property: in Principle", Anne Fitzgerald and Brian Fitzgerald, Thomson, Sydney, 2004; and
 * Chapter 4 - Copyright, in "Internet and e-commerce law: technology, law and policy" Brian, Fitzgerald, Anne Fitzgerald  et al, Thomson, Sydney, 2007.

Canada
Canada, though also a common law jurisdiction like Australia, has tended to limit the range of IP rights in databases more. In particular the recent case of CCH Canadian Ltd. v. Law Society of Upper Canada included discussions of originality, the 'sweat of the brow' approach and references to the Feist case. However, there was no clear ruling relevant to data licensing as database rights were not specifically issue. From [9]:

"Paragraphs 15 to 25 hint towards the question of database protection, and even cite the US case (Feist) on telephone directories, but as is usual, since that question wasn't actually one of the ones that needed to be decided here, they didn't completely decide it. The Court rejects the "sweat of the brow" definition of "original work" (which is the one that leads to "the phone book database is an original work") but also rejects the "creativity" definition (which requires work to be "novel" or "unique"); instead, "originality" is supposed to require "exercise of skill and judgment".  From paragraph 16:  "The exercise of skill and judgment required to produce the work must not be so trivial that it could be characterized as a purely mechanical exercise." That sounds like it could exclude databases.  But then in the next sentence there will be something to annoy the graphic design people (hi, Kate!):  "For example, any skill and judgment that might be involved in simply changing the font of a work to produce "another" work would be too trivial to merit copyright protection as an "original" work."

Cf. Sweat of the Brow, Creativity, and Authorship: On Originality in Canadian Copyright Law, Abraham Drassinower

Overview
Ths US is a common-law jurisdiction. However the Feist decision substantially raises the originality 'bar' required for the existence of a copyright in a compilation. There are excellent summaries of the US situation in [14] and [13a]. [13a] states:

"The US has no database law like the European Union. Databases can be protected by copyright if they qualify as a "compilation". This requires that the items were included into the database because of some creative expression on the part of the collector. For instance a "best of 2004" collection qualifies. This involves an aesthetic judgment about what is the "best". A "complete list of English words" would not, since trying to be complete is not a creative activity."

"Some other legal doctrines are available in special cases. Using someone else's "hot news" may be unlawful. And using electronic spiders (web robots) to extract information from someone else's site may qualify as electronic trespassing."

Thus, while a pure 'database' right does not exist it seems likely that one can obtain copyright in at least some collections of data. Given this uncertainty there is all the more reason to use an explicit license. (Cf. the comments along similar lines of Harlan Onsrud in [1]).

Licensing v. Contracts
There is a lot of confusion between the common usage of "licenses" v. "Contracts." One thing to keep in mind is that a license is not a contract when it comes to certain kinds of data.

US Data outside the US
Furthermore we should note that even if data in the US had no IP protection it would not prevent said data being protected elsewhere (though note that the EU DB directive provides has reciprocity stipulations that mean a DB provider from a jurisdiction which does not provide DB protection will not be able to use the rights provided in the directive).

For example, the information provided in [10] appears to suggest the library of congress charges for its data to users outside of the US.

Feist v. Rural
Feist Publications, Inc., v. Rural Telephone Service Co. was a Supreme Court case from 1991. Rural claimed that Feist infringed their copyright by including portions of their local telephone listings in larger regional directories. The Supreme Court reversed the ruling of the District Court and the Court of Appeals - that Feist infringed copyright - suggesting that "originality, not 'sweat of the brow', is the touchstone of copyright protection in directories and other fact-based works". The fact that Rural, as a telephone company, was obliged to annually publish a telephone directory due to state regulation, was taken into account. Furthermore, it was mentioned that Feist's product would be less marketable if there were gaps in their listings - and that Feist and Rural "compete vigorously". The crucial point however, was that Rural's directory did not constitute a copyrightable 'work'. It lacked originality in the form of selection or arrangements of its parts - described by the court as "a garden-variety white pages directory, devoid of even the slightest trace of creativity". In the absence of original expression in its component parts it was ruled that the listings were not copyrightable.

Feist v. Rural ruling

Federal Government Data
To confuse matters further the US constitution mandates that the output of federal agencies be put into the public domain. This has the result that all government data is automatically put into the public domain. Note however that this does not mean that those who use or build upon that data necessarily are placing their work in the public domain.

Use Cases
Things to consider:


 * What is covered.
 * What is boundary of share-alike.
 * Difference between a derivative work and a compilation
 * Attribution requirements for data

Geodata in the UK
See http://www.journalofmaps.com/cgi-bin/blosxom.cgi/GIS/GRADE_Waelde.html

OWL Ontology for Use with Geodata
See mail thread: http://lists.okfn.org/pipermail/okfn-discuss/2007-April/000401.html

Archaeological Data
TODO

Chemical Data
Chemical data, such as that collected in repositories such as PubChem and the world wide molecular matrix, though dealing with the physical world (pure facts) will certainly be subject to the same provisions as any other form of data. Thus it is important to apply a license to the data to ensure that its status is clear in those jurisdictions which allow IP rights in data.

Credits
Add your name here if you would like to listed as a contributor:


 * RufusPollock
 * JonathanGray
 * Various authors from the Oak Law Project

In addition we'd like to acknowledge the excellent sciencecommons FAQ [11] originally put together by Mia Garlick of CC.