OGDCamp 2011 Organizational Identifiers Workshop

From Open Knowledge Foundation

Jump to: navigation, search

Notes from [OGDCamp 2011 Organizational Identifiers Workshop 2011-10-23]: Sunday 23rd October 2011, Warsaw, Poland

Rough notes can be found on EtherPad here

Contents

Summary

Many projects have a need for re-usable organisational identifiers which can be used to map together data about organisations from different sources, and to consistently identify an organisation within a dataset. This workshop, a satellite event of the 2011 Open Knowledge Foundation Open Government Data Camp, explored different existing efforts in the organisational identifier space, and identified a number of key principles and proposals for action to develop shared approaches, standards and infrastructures for organisational identifier schemes.

Present

Tim Davies (Practical Participation / Aid Info, Facilitator), Chris Taggart (Open Corporates), Ramine Tianati (Southampton University), Rolf Kleef (Open for Change), Alvaro Graves (Tetherless World Constellation / LOGD Project), John Wonderlich (Sunlight Foundation), Kaitlin Lee (Sunlight Foundation), Rufus Pollock (Open Knowledge Foundation), Freiderich Lindenberg (Open Knowledge Foundation / Open Spending), Ruth Del Campo (New York Law School), elf Pavlic, Derota, and by Skype for the first session: Bill Anderson (Development Initiatives / IATI), Dinesh Venkateswaran (Techsoup Global), John Hecklinger (Global Giving), James Robertson (Alterseed)

Key principles

The workshop identified a number of key principles for developing shared standards around organisational identifiers:

What is needed from an organizational identifier: use cases and requirements

There are many different use-cases for organisational identifiers, with overlapping but different sets of requirements. Use cases discussed in the workshop included:

(1) Definitively identifying legal entities

Identifiers should relate directly to the instruments that bring an entity into being: i.e. company registration numbers . It would be helpful to record the relationships between entities, and capture details of their change over time (e.g. ‘X is member of group Y’, ‘X is owned by Y’, ‘X merged with Y’)

(2) Identifying conceptual entities

Although we might talk about ‘Microsoft’, there is no single legal entity which is ‘Microsoft’. Finding ways to relate identifiers to common place conceptual entities is useful in a number of cases. Answering the question ‘What is a company?’, or ‘What is a charity?’ turns out to be fairly complex when you are working across borders.

(3) Identifying national, international and super-national organisations

Some schemes only need to identify organisations within a specific jurisdiction, others need to identify organisations across borders, and even to identify international institutions which have no direct country-level registration or identifiers.

(4) Identifying organisations of a particular status

For example, a scheme may only need to cover charities. The nature of a Charity varies between jurisdictions. In some, an association or company may exist as an informal or legal entity prior to registering as a charity (so charity is a status of an existing organisation), in others, Charity Registration may create a new organisation.

(5) Using legacy identifiers A system may have some internal set of identifiers which are not mapped to shared organisational identifiers, but which are available in an existing system to expose. It would be useful to find ways to map these onto existing shared identifiers.

(6) Providing identifiers where none exist, or none pick-out the required concept

Some organisations which need to be identified do not have an existing identifiers. For example: non-constituted associations in the UK charities in some countries where not registration scheme is available Sometimes the scope of existing organisational identifiers does not match the scope required. For example: an organisations identifier may not clearly communicate the organisations status (e.g. charity) because of limitations in the registrations systems in operation in that organisations country.

Existing Schemes

A number of existing proposals or schemes for organisational identifiers exist.

IATI Organisation Standard

The draft Organisational Standard of the International Aid Transparency Initiative is currently based on either using an organisational ID from the OECD Development Assistance Committee Code List (which covers a number of international organisations not otherwise registered, and a number of donor government departments), or an identifier of the form:

COUNTRY-IDENTIFICATION SCHEME-ID Where COUNTRY is an ISO 2-digit code, and IDENTIFICATION SCHEME is a code to identify the registration scheme in use, agreed with the IATI Secretariat or Technical Advisory Group, and ID is the identifier from that scheme. So, for example, The US based William and Flora Hewlett Foundation can be identified by:

 US-EIN-941655673

Where EIN is a unique US identification scheme used by charities and companies. And UK charity development initiatives can be identified by:

 GB-COH-06368740

No mechanism is put forward in the draft IATI Organisation Identifier Standard for resolving identifiers to information about the given organisation, or for the identifier to prefer if an organisation has more than one identifier.

The IATI Organisation Identifier is also used as the basis for IATI Activity Identifiers, which take the form: ORGANISATION ID-ACTIVITY ID For example, an activity of Development Initiatives could have the ID: GB-COH-06368740-DIPRA4

Open Corporates URIs

Open Corporates is compiling a database of corporate legal entities across jurisdictions, drawing on publicly available company data, either as open data, or scraped from web sites.

Open Corporates exposes data at http://opencorporates.com/companies/ using the format: http://opencorporates.com/companies/COUNTRY OR DISTRICT/REGISTRATION NUMBER Where COUNTRY OR DISTRICT is an ISO code that picks out a country or a state/district if company registration in a particular country is handled at the sub-national level. For example, a US company registered in the District of Columbia with the registration ID L10053 will have the URL:

http://opencorporates.com/companies/us_dc/L10053 This URL returns human readable data about the company. Appending .xml, .json or .rdf to this URL will return machine readable data. Open Corporates is specifically concerned with providing identifiers for companies.

Identity Hub

The Linking Open Government Data (LOGD) project at Tetherless World Consortium have put forward a series of design principles for URIs for US Linked Government Data based on the URI template:

 'http://' BASE '/' 'id' '/' ORG '/' CATEGORY ( '/' TOKEN )

Allowing, for example, URIs of the form: http://BASE/id/us/fed/agency/Commerce/National_Oceanic_and_Atmospheric_Administration Here BASE can be replaced with any service which ca resolve the required URI and provide data about it.

Global Giving Collaboration

Global Giving and other partners are working on the development of an identifier scheme particularly for use case 6: providing identifiers where none exist, or the scope of existing identifiers is inappropriate.

This is likely to involve facilitating registration of new identifiers for some organisations.

Wider Initiatives

There are a number of other actors, initiatives and other ongoing projects in the organisational identifier space. The workshop identified the following:

Working Groups

The afternoon of the workshop involved focussed work on three topics:

Architectures

This working group identified a possible architecture building upon the IATI Identifier model but suggesting:

Allowing for multiple namespaces

Such that an identifier could be of the form:

 US-NY-DMV-AA-12345  

Where US is the Country (top-level namespace), NY is a second-level namespace identifying State, DMV is a third-level namespace identifying the registration or identity scheme in use, AA identifies a set of categories within this identification scheme, and 12345 is the relevant identification number.

(Note: following reflection, it might be more appropriate to reverse the namespace so that identification scheme type is the top-level category, e.g. COH-GB-12345 for UK company, as this makes it easier to declare and use resolution services)

Providing a light-weight ‘authority list’ of namespaces

Namespaces would generally be hierarchical under countries, but a number of top-level namespaces would be provided, including OECD-DAC- and other relevant general identification schemes. A central point is needed to provide an authoritative list of namespaces.

In the medium term the authority list will need some governance structure, with a process for agreeing which namespaces are added, and registering resolution services against namespaces. This might comprise of a small virtual committee of interested parties, working via a consensus based e-mail list to respond to proposals for new namespaces.

Whilst the authority list could follow a DNS model and delegate control over a set of top-level namespaces to sub-authorities, this was deemed too complicated for initial implementation.

In the short term, a simple file will suffice listing:

Providing a resolution service standard

Anyone should be able to declare a service to resolve identifiers in a particular namespace.

For example, Open Corporates may declare that it will resolve any identifiers for companies namespaces, and provide a base URI of http://opencorporates.com/companies/id/.

An application that has the ID GB-COH-06368740 could then look up in the authority list that Open Corporates provides a resolver, and could append the ID to the opencorporates.com base URI to fetch back data on the organisation in question.

Resolution services should return a standard set of data, including, wherever possible, details of related organisations.

The resolution standard should include provision of an ‘at_time’ parameter, so that if a resolution service is able to provide data for past periods this can be requested. For example a consuming application may have data recording a transaction with a company in 2005. If they request data on that company from a resolution service, with an ?at_time=2005-01-01 (for e.g.) parameter then the service should return the details of the company as of that time (if known). With access to the authority list, and an existing organisational identifier from one of the namespaces registered in the authority list, anyone should be able to construct a standardised organisational identifier. If a resolution service is available for the namespace in question it should be possible to look up details on that organisation, which will hopefully including relationships of this identifier to other relevant identifiers.

Other points from the group included

Common terms

The common terms group worked on a preliminary typology of relationships between organisations that could provide the basis for some standard sets of information that resolution services should attempt to provide when they can. Preliminary Typology of Relations

Note: "supplier to", "donates to" indicating there are multiple transactions, either or not available as separate facts.

Temporal relationships also need to be captured. SPLIT INTO A split into B. C, ... (A ceases to exist, and B, C, ... start to exist)


[A------]
         [B----]
         [C----]
         [...]

A created spin-off B (A continues to exist, B starts to exist)
[A-------------]
         [B----]

A, B, ... merged into C (A, B, ... cease to exist, C starts to exist)
[A------]
[B------]
[...]
         [C----]

A acquires B (and moves its assets into A, B ceases to exist
[A--------]
[B--]


Identifying Public Bodies

Schemes like the OECD DAC Code List only include a small selection of public bodies, and tend to only include public bodies involved in Aid Donation. There are few definitive national lists of public bodies. Finding a clear definition of what constitutes a public body is also complicated: A public body could be defined as:

Public bodies are also liable to change over time: as departments are merged, renamed and restructured, and administrative boundaries reshaped. Identifying when public bodies should get new IDs, or when their old IDs should be retained requires careful attention. Two possible proposals have been put forward for developing identifier sets for public bodies:

Neither of these proposals fully resolve the problem of identifying public bodies and further work may be needed on this.

Next Steps

There are a number of next-steps from the workshop:

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox