FutureOfBibliographicControl/appendix
From Open Knowledge Foundation
Appendix: Additional Detailed Comments
Overview
This document is a response to the call for comments on a draft released by the Working Group on the Future of Bibliographic Control on 30th November 2007 [1].
We think it is laudable that the Working Group have recommended that the Library of Congess takes a more active role in leading the library world into 21st century. Their vision of a bibliographic control ecosystem which is "collaborative, decentralized, international in scope and web-based" (p. 1) is timely.
However, we are concerned that there is no explicit mention of the potential benefits of open licensing for bibliographic data. Over the past few years, open licensing has facilitated the explosive growth of a 'knowledge commons'. To give a few prominent examples: Open Access journals, Open Educational Resources and Open Data in scientific research [2] have all been enabled by licenses which permit material to be freely re-used and re-distributed [3].
We believe open licensing would strongly help to catalyse the flourishing of an information ecology for bibliographic data - by allowing and encouraging anyone to share, modify and build on it. Openly licensed bibliographic data would allow users and developers to:
- improve the quality of the data by correcting errors, and adding ancillary information;
- attempt to harmonise and integrate data that is from multiple sources, in different formats and which adheres to different sets of standards;
- use technologies such as wikis and versioning systems to facilitate the collaborative development of data [4];
- host bibliographic data and experiment with distributed data provision and access;
- combine bibliographic datasets with other material - such as user-contributed reviews, images and 'tags';
- build innovative (web) applications to explore and represent the wealth of information contained in bibliographic records, e.g. through datamining and/or visualization technologies [5];
- extract structured, machine-readable data from bibliographic records and to link this to other open datasets in the emerging semantic web of data [6].
New kinds of technologies are emerging very rapidly - and we think that one of the best ways for the library community to see the fruits of these developments applied to bibliographic data is to permit greater experimentation with the data by the wider technical community - and the general public. Placing restrictions on how bibliographic records may be re-used effectively inhibits community-led development and innovative 'tinkering'. One of the implicit principles of more 'open' models of development is that 'the most interesting thing to be done with your material will be thought of by someone else'. This kind of thought resonates strongly with the "decentralised", "dynamic", "collaborative" ethos propagated in the report, in which users and third party organisations are encouraged to play a more active role in bibliographic control.
Summary of comments
- The potential benefits of open licensing should be mentioned in the draft. We've identified several places where such mention may be appropriate.
- The draft should strive to acknowledge a broad spectrum of parties who may contribute to an ecosystem of bibliographic control, and who benefit from shared bibliographic data - including individual technical developers, enthusiasts and a diverse variety of third part organisations - rather than simply either libraries, library users and commercial contractors. (Cf. comments on p. 1, par. 1)
- It should be recognised that open licensing can help to lower or remove transaction costs. (Cf. comments on p. 1, par. 1)
- We urge that even if value-added data products or services are sold in order to recover costs, openly licensing 'raw' bibliographic data is still considered. (Cf. comments on p. 4, par. 3)
- The Library of Congress should take into account short and long term opportunities to create 'public value' as well as opportunities for market growth when considering making alterations to its pricing structure. (Cf. comments on p. 8, par. 1; p. 13, sect. 1.1.4)
- The report should explicitly acknowledge significant work by non-profit organisations in the areas of digitisation and bibliographic control as well as contributions of commercial vendors. (Cf. comments on p. 8, par. 2)
- The Library of Congress should take a leading role in encouraging bibliographic data to be shared - encouraging other individual libraries to make their data available under an open license where possible. (Cf. comments on p. 8, par. 5)
- Open bibliographic data would encourage relevant groups to improve and build on each other's work rather than doubling up effort in parallel development. (Cf. comments on p. 9, par. 1)
- A strong culture of sharing bibliographic information may help libraries not become over-dependent on third party contractors to replace work currently done by Library of Congress. (Cf. comments on p. 15, sect. 1.2)
- The products of digitizing material that is in the public domain should be made available under an open license where possible. (Cf. comments on pp. 19-20, sect. 2; p. 21, sect 2.4)
- The Library of Congress should implement changes in metadata standards such that there is be a field within each bibliographic record to specify the license the record is available under (Cf. comments on pp. 21-26, sect. 3.)
Comments
N.B. We take 'bibliographic data' to refer to metadata concerning library holdings - primarily in the form of bibliographic records.
Introduction
p. 1, par. 1
"Its realization will occur in cooperation with the private sector, and with the active collaboration of library users."
- The implied distinction - between formal cooperation with the private sector and input from ordinary library users - may become increasingly blurred. We think it would be valuable to recognise that there is potential for a broad spectrum of potential collaborators ranging between these two poles - including individual technical developers and smaller groups who might wish to re-use or add value to bibliographic data without necessarily, e.g., contracting with the relevant producer.
"Data will be gathered from multiple sources; change will happen quickly; and bibliographic control will be dynamic, not static."
- Open licensing would help to ensure that bibliographic control is dynamic and that change happens quickly by eradicating the requirement that every user asks permission from every data producer for each new application of bibliographic information.
"Libraries must continue the transition to this future without delay in order to retain their relevance as information providers."
- As mentioned above, openly licensing bibliographic material would help to accelerate this transition by allowing third parties to experiment with innovative ways of re-using it and building on it - including the development of new kinds of applications, services, plugins, and so on.
Background
p. 4, par. 3
"According to current congressional regulations, LC is permitted to recover only direct costs for services provided to others. As a result, the fees that the Library charges do not cover the most expensive aspect of cataloging: namely, the cost of the intellectual work. . The economics of creating LC's products have changed dramatically since the time when the Library was producing cards for library catalogs. It is now time to reevaluate the pricing of LC's product line in order to develop a business model that allows LC to more substantially recoup its actual costs."
- Reevaluating product pricing is arguably one way among several towards cost recovery. Also, while the LC might recoup costs through revenue generated through value-added products and services - we hope this does not preclude any effort to encourage the circulation of its raw data.
Guiding Principles
p. 7, par. 3
"Different communities of bibliographic practice have grown up around different resource types: library collections of books and journals, archives, journal articles, and museum objects and images. As these resources and others become increasingly accessible through the Web, separation of the communities of practice that manage them is no longer desirable, sustainable, or functional. Bibliographic control is increasingly a matter of managing relationships—among works, names, concepts, and object descriptions—across communities. Consistency of description within any single environment, such as the library catalog, is becoming less significant than the ability to make connections between environments: Amazon to WorldCat to Google to PubMed to Wikipedia, with library holdings serving as but one node in this web of connectivity. In today's environment, bibliographic control cannot continue to be seen as limited to library catalogs."
- Again, open licensing could be mentioned here, given this projected decentralisation and the importance of widespread collaboration among many different parties.
p. 8, par. 1
"Once considered a public good, information access is today a commodity in a rapidly-growing marketplace. Many information resources formerly managed in the not-for-profit sector are now the objects of a significant for-profit economy. Entities in this latter economy have financial capabilities far beyond those of libraries. Further, they have the resources to engage in large scale research and development."
- We think its crucial here to strike a balance here between encouraging public benefit and market growth. Open licensed bibliographic data would allow the general public to benefit from being able to freely re-use and re-distribute it, as well as commercial organisations to benefit from being able to re-use it in their products and services. Increased commercial exploitation would also arguably indirectly generate more revenue for government organisations such as LC through an increase in taxable profits. Open licensing also allows community driven development, which may in some cases yield similar or even preferable results to well funded closed models of development. Also open licensing is becoming increasingly popular for large for-profit enities, who may, for example, charge for associated services.
p. 8, par. 2
"Libraries of today need to recognize that they are but one group of players in a vast field, and that market conditions necessitate that libraries interact increasingly with the commercial sector. One example of such interaction can be found in the various mass digitization projects in which for-profit organizations are making use of library resources and library metadata."
- It is also important to recognise new partnerships with non-profit organisations in this area - such as the important digitisation work being carried out by the Internet Archive and by The Open Library with members of the Open Content Alliance.
p. 8, par. 5
"Sharing, however, is not a strategy for LC alone. The entire library community and its many partners must also be part of it."
- Again, by advocating liberal licensing practices on a wide scale - the LC could effectively encourage libraries to scale their bibliographic control operations by sharing their data.
p. 9, par. 1
"Is there duplicate effort being expended? Are there possible partnerships that could reduce the burden on the Library?"
- Open licensing in this area would encourage relevant groups to improve and build on each other's work rather than doubling up effort in parallel development.
p. 9, par. 4
"In addition, the standards landscape in the library field is murky, with many different organizations working on similar standards in a non-coordinated fashion."
- See comments on p. 9, par. 1, above.
Findings and Recommendations
p. 11, sect. 1.1
"The Working Group identified three primary areas of redundancy in the bibliographic production process:
- the supply chain, wherein some data are created by publishers and vendors and later re-created by library catalogers;
- the modification of records within the library community, wherein such modifications are not shared, even though they could be useful to others; and
- the expenses that are incurred when individual libraries must purchase records because the sharing of those records is prohibited or restricted."
- This whole section on increased sharing and eliminating redundancies is an opportune place to allude to the potential of open licensing.
p. 12, sect. 1.1.1.1 & 1.1.2.1
"1.1.1.1 All: Be more flexible in accepting bibliographic data from others (e.g., publishers, foreign libraries) that do not conform precisely to U.S. library standards."
"1.1.2.1 All: Develop workflow and mechanisms to use data and metadata from network resources, such as abstracting and indexing services, Amazon, IMDb, etc., where those can enhance the user's experience in seeking and using information.
- Its likely that some form of liberal licensing is requisite for utilising third party data (1.1.1.1) and in re-purposing existing metadata (1.1.2.1) on a large scale.
p. 13, sect. 1.1.4
"1.1.4 Re-Examine the Current Economic Model for Data Sharing in the Networked Environment
1.1.4.1 LC: Convene a representative group consisting of libraries (large and small), vendors, and OCLC members to address costs, barriers to change, and the value of potential gains arising from greater sharing of data, and to develop recommendations for change. 1.1.4.2 LC: Promote widespread discussion of barriers to sharing data. 1.1.4.3 LC: Reevaluate the pricing of LC's product line with a view to developing a business model that enables more substantial cost recovery."
- We strongly suggest that the public good (or the economic notion of 'social welfare'), in addition to cost recovery, should be taken into account in the analysis of these issues. Particularly given the trend setting role it is suggested that LC takes in the wider world of bibliographic control.
p. 15, sect. 1.2
"Long-term dependence on Library of Congress bibliographic services leaves the users of those services increasingly vulnerable to any changes in them.
Long-term reliance on Library of Congress leadership and on its provision of cataloging records leads libraries—even some large libraries with relatively plentiful staff—to think that they bear no responsibility, individually or collectively, for sharing substantively in the work of bibliographic control."
- Note the same would be true if, for example, more libraries outsourced bibliographic work to 'closed' private contractors to replace core functions that had previously been fulfilled by LC. It seems that a stronger culture of sharing and exchanging data between libraries (perhaps in addition to third party contractors and contributions) is a more sustainable strategy that would leave libraries in a better position in the longer term - and able to do at least some work 'in house'.
p. 16, sect. 1.2
"All types of libraries will contribute to the best of their abilities and resources to the "public good" that comes from bibliographic control and resource sharing."
- Again, we strongly suggest that this is factored into the kinds of discussions and analyses recommended in 1.1.4 (cf. comments on p. 13, sect. 1.1.4).
p. 18, sect. 1.3
"There will be increased sharing of authority data between libraries and between library systems and systems from other communities, with library authority data available to anyone working with bibliographic data. Economies will be realized by minimizing the number of times the same entity needs to be researched. Exchange of information about the same name from one system to another will be made simpler and more reliable. Access to data will be unimpeded and barriers to using data will be minimized."
- This is another opportune moment to mention the potential benefits of open licensing.
pp. 19-20, sect. 2 & p. 21, sect 2.4
"2.4 Encourage Digitization to Allow Broader Access"
- Though, as stated above, our primary interest in these comments is in bibliographic metadata, we also advocate making the digitised images of material that is in the public domain available under an open license where possible.
pp. 21-26, sect. 3
- It would be extremely valuable if LC encouraged all library records to have a standard metadata field that that included information on the license of the library record itself.
p. 23, sect. 3.1
"Library bibliographic data will move from the closed database model to the open Web-based model wherein records are addressable by programs and are in formats that can be easily integrated into Web services and computer applications. This will enable libraries to make better use of networked data resources and to take advantage of the relationships that exist (or could be made to exist) among various data sources on the Web."
- Open licensing could greatly help to facilitate the emergence of such a 'open' model.
p. 28, sect. 4.1
"Library bibliographic data will be used in a wide variety of environments, and interoperability between library and non-library bibliographic applications will increase/improve.
Library catalogs are seen as valuable components in an interlocking array of discovery tools."
- Again, this is a particularly opportune place to mention the possibility of using a liberal license.
p. 31, sect. 4.3.1.2
"4.3.1.2 LC: Provide LCSH openly for use by library and non-library stakeholders."
- Another possible place to mention open licensing.
pp. 33-4, sect. 5.1
- Again, we strongly suggest that the 'public good' be taken into account while building an evidence base. (Cf. p. 13, sect. 1.1.4)
References
All bracketed page numbers refer to the Draft Final Report of the Working Group http://www.loc.gov/bibliographic-future/news/lcwg-report-draft-11-30-07-final.pdf.
[1] Letter from the Working Group – November 30, 2007 http://www.loc.gov/bibliographic-future/news/lcwg-report-memo-11-30-07.pdf
[2] According to the Directory of Open Access Journals http://www.doaj.org/ there are now just under 3000 Open Access journals with over 160,000 articles. See Open Access News http://www.earlham.edu/~peters/fos/fosblog.html for more on open projects in scholarly publishing and research. OER (Open Educational Resources) Commons http://www.oercommons.org/ is a major portal for open course content. Science Commons http://sciencecommons.org/ is a significant proponent of open licensing for scientific research data.
[3] Creative Commons and Talis both maintain open licenses such as the Creative Commons Attribution license and the Open Data License. Another frequently used open license is the GFDL, which Wikipedia's content is licensed under. For a more comprehensive list see http://www.opendefinition.org/licenses .
[4] The Open Library is a prominent project that is currently experimenting with versioning in bibliographic data http://www.openlibrary.org/ .
[5] To give an example, many developers are exploring different uses of the open-source suite of tools from MIT's Simile project http://simile.mit.edu/, which allows large datasets to be represented on a timeline.
[6] The WC3 Community Project 'Linking Open Data' http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData which includes Tim Berners-Lee is currently pioneering work in this area.