Working Groups/Science/panton principles

= The "Panton Principles": Best Practice for Publishing Scientific Data (Draft) =

1. Where data or collections of data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual data elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.

When publishing data make an explicit and robust statement of your wishes.

2. Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described at http://datalicense.page.somewhere [Perhaps: http://opendefinition.org/licenses#Data?]. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.

Use a recognized waiver or license that is appropriate for data to make your wishes clear.

3. The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.

If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge Definition (http://opendefinition.org/1.0/) - in particular non-commercial and other restrictive clauses should not be used.

4. Furthermore, in science it is STRONGLY recommended that data, especially where publicly funded, be explicitly placed in the public domain via the use of the PDDL or CCZero. This is in keeping with the public funding of much scientific research and the general ethos of sharing and re-use within the scientific community.

Explicit dedication of data from public science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge Definition as applied to data.

Comment 6 July 2009
The ur-form of the Panton Principles as expressed by Peter Murray-Rust was:

“Where a decision has been taken to publish data deriving from public science research, best practice to enable the re-use and re-purposing of that data, is to place it explicitly in the public domain via {one of a small set of protocols e.g. cc0 or PDDL}.”

The draft above expands on and refines this admirably, with one exception: the idea of best practice has gone, in favour of discouragements and recommendations.

Returning to a formulation in terms of best practice may be attractive, in that it could lessen any flavour of telling people what to do, and enhance the prospect -- if and as agreement that the Principles are indeed best practice grows over time -- of a true community norm developing.

Frequently Asked Questions

 * Do I need to include a license with my data?
 * How does the public domain enforce attribution?