OGDCamp 2011 Organizational Identifiers Workshop

Notes from [OGDCamp 2011 Organizational Identifiers Workshop 2011-10-23]: Sunday 23rd October 2011, Warsaw, Poland

Rough notes can be found on EtherPad here

= Summary = Many projects have a need for re-usable organisational identifiers which can be used to map together data about organisations from different sources, and to consistently identify an organisation within a dataset. This workshop, a satellite event of the 2011 Open Knowledge Foundation Open Government Data Camp, explored different existing efforts in the organisational identifier space, and identified a number of key principles and proposals for action to develop shared approaches, standards and infrastructures for organisational identifier schemes.

Present

 * Tim Davies (Practical Participation / Aid Info, Facilitator)
 * Chris Taggart (Open Corporates)
 * Ramine Tianati (Southampton University)
 * Rolf Kleef (Open for Change)
 * Alvaro Graves (Tetherless World Constellation / LOGD Project)
 * John Wonderlich (Sunlight Foundation)
 * Kaitlin Lee (Sunlight Foundation)
 * Rufus Pollock (Open Knowledge Foundation)
 * Freiderich Lindenberg (Open Knowledge Foundation / Open Spending)
 * Ruth Del Campo (New York Law School)
 * elf Pavlic? Derota?

By Skype for the first session:


 * Bill Anderson (Development Initiatives / IATI)
 * Dinesh Venkateswaran (Techsoup Global)
 * John Hecklinger (Global Giving)
 * James Robertson (Alterseed)

Key principles
The workshop identified a number of key principles for developing shared standards around organisational identifiers:
 * Use existing identifiers whenever they are available
 * New identifiers should only be created as a last resort. The standards should build on existing IDs issued to organisations. With the existing ID of an organisation it should be possible to work out it’s ID under the shared standard.
 * Develop mapping and resolution services to connect IDs, rather than proposing adoption of unique IDs. To address the challenge of two identifiers picking out the same organisation we will propose approaches to map the relationship of identifiers, and resolve one identifier to another.
 * Focus on simple solutions: Look for the minimal viable solution that will scale in future.
 * Use distributed approaches wherever possible: Avoiding the introduction of centralized identifiers.

What is needed from an organizational identifier: use cases and requirements
There are many different use-cases for organisational identifiers, with overlapping but different sets of requirements. Use cases discussed in the workshop included:

(1) Definitively identifying legal entities

Identifiers should relate directly to the instruments that bring an entity into being: i.e. company registration numbers. It would be helpful to record the relationships between entities, and capture details of their change over time (e.g. ‘X is member of group Y’, ‘X is owned by Y’, ‘X merged with Y’)

(2) Identifying conceptual entities

Although we might talk about ‘Microsoft’, there is no single legal entity which is ‘Microsoft’. Finding ways to relate identifiers to common place conceptual entities is useful in a number of cases. Answering the question ‘What is a company?’, or ‘What is a charity?’ turns out to be fairly complex when you are working across borders.

(3) Identifying national, international and super-national organisations

Some schemes only need to identify organisations within a specific jurisdiction, others need to identify organisations across borders, and even to identify international institutions which have no direct country-level registration or identifiers.

(4) Identifying organisations of a particular status

For example, a scheme may only need to cover charities. The nature of a Charity varies between jurisdictions. In some, an association or company may exist as an informal or legal entity prior to registering as a charity (so charity is a status of an existing organisation), in others, Charity Registration may create a new organisation. (5) Using legacy identifiers A system may have some internal set of identifiers which are not mapped to shared organisational identifiers, but which are available in an existing system to expose. It would be useful to find ways to map these onto existing shared identifiers. (6) Providing identifiers where none exist, or none pick-out the required concept

Some organisations which need to be identified do not have an existing identifiers. For example:  non-constituted associations in the UK charities in some countries where not registration scheme is available  Sometimes the scope of existing organisational identifiers does not match the scope required. For example:  an organisations identifier may not clearly communicate the organisations status (e.g. charity) because of limitations in the registrations systems in operation in that organisations country.

Existing Schemes
A number of existing proposals or schemes for organisational identifiers exist.

IATI Organisation Standard
The draft Organisational Standard of the International Aid Transparency Initiative is currently based on either using an organisational ID from the OECD Development Assistance Committee Code List (which covers a number of international organisations not otherwise registered, and a number of donor government departments), or an identifier of the form:

COUNTRY-IDENTIFICATION SCHEME-ID Where COUNTRY is an ISO 2-digit code, and IDENTIFICATION SCHEME is a code to identify the registration scheme in use, agreed with the IATI Secretariat or Technical Advisory Group, and ID is the identifier from that scheme. So, for example, The US based William and Flora Hewlett Foundation can be identified by: US-EIN-941655673 Where EIN is a unique US identification scheme used by charities and companies. And UK charity development initiatives can be identified by: GB-COH-06368740 No mechanism is put forward in the draft IATI Organisation Identifier Standard for resolving identifiers to information about the given organisation, or for the identifier to prefer if an organisation has more than one identifier.

The IATI Organisation Identifier is also used as the basis for IATI Activity Identifiers, which take the form: ORGANISATION ID-ACTIVITY ID For example, an activity of Development Initiatives could have the ID: GB-COH-06368740-DIPRA4

Open Corporates URIs
Open Corporates is compiling a database of corporate legal entities across jurisdictions, drawing on publicly available company data, either as open data, or scraped from web sites.

Open Corporates exposes data at http://opencorporates.com/companies/ using the format: http://opencorporates.com/companies/COUNTRY OR DISTRICT/REGISTRATION NUMBER Where COUNTRY OR DISTRICT is an ISO code that picks out a country or a state/district if company registration in a particular country is handled at the sub-national level. For example, a US company registered in the District of Columbia with the registration ID L10053 will have the URL:

http://opencorporates.com/companies/us_dc/L10053 This URL returns human readable data about the company. Appending .xml, .json or .rdf to this URL will return machine readable data. Open Corporates is specifically concerned with providing identifiers for companies.

Identity Hub
The Linking Open Government Data (LOGD) project at Tetherless World Consortium have put forward a series of design principles for URIs for US Linked Government Data based on the URI template:

'http://' BASE '/' 'id' '/' ORG '/' CATEGORY ( '/' TOKEN )

Allowing, for example, URIs of the form: http://BASE/id/us/fed/agency/Commerce/National_Oceanic_and_Atmospheric_Administration Here BASE can be replaced with any service which ca resolve the required URI and provide data about it.

Global Giving Collaboration
Global Giving and other partners are working on the development of an identifier scheme particularly for use case 6: providing identifiers where none exist, or the scope of existing identifiers is inappropriate.

This is likely to involve facilitating registration of new identifiers for some organisations.

Wider Initiatives
There are a number of other actors, initiatives and other ongoing projects in the organisational identifier space. The workshop identified the following: = Working Groups = The afternoon of the workshop involved focussed work on three topics:
 * ORGPedia is focussed on identifying US companies and relating their different identifiers.
 * The European Union has announced plans to work on open and interoperable data from company registers.
 * Dun & Bradstreet provide the DUNS number to organisations that register. DUNS number data is proprietary. DUNS numbers can refer to a legal entity, divisions of that entity and individual branches.
 * Bloomberg Number created by Bloomberg.
 * The architecture of an organisation standard - Identifying key components of an identifier standard, and mechanisms for resolving identifiers to information and data on the entity identified.
 * 'Common terms and descriptions of organisation relationships
 * Identifying public bodies - Public bodies tend not to be registered like companies or charities are. We need a scheme to identify public bodies.

Architectures
This working group identified a possible architecture building upon the IATI Identifier model but suggesting:

Allowing for multiple namespaces
Such that an identifier could be of the form:

US-NY-DMV-AA-12345

Where US is the Country (top-level namespace), NY is a second-level namespace identifying State, DMV is a third-level namespace identifying the registration or identity scheme in use, AA identifies a set of categories within this identification scheme, and 12345 is the relevant identification number.

(Note: following reflection, it might be more appropriate to reverse the namespace so that identification scheme type is the top-level category, e.g. COH-GB-12345 for UK company, as this makes it easier to declare and use resolution services)

Providing a light-weight ‘authority list’ of namespaces
Namespaces would generally be hierarchical under countries, but a number of top-level namespaces would be provided, including OECD-DAC- and other relevant general identification schemes. A central point is needed to provide an authoritative list of namespaces.

In the medium term the authority list will need some governance structure, with a process for agreeing which namespaces are added, and registering resolution services against namespaces. This might comprise of a small virtual committee of interested parties, working via a consensus based e-mail list to respond to proposals for new namespaces.

Whilst the authority list could follow a DNS model and delegate control over a set of top-level namespaces to sub-authorities, this was deemed too complicated for initial implementation.

In the short term, a simple file will suffice listing:


 * Namespace (e.g. GB, or GB-COH)
 * Identifier Type - are identifiers in this namespace ‘registration IDs’ (e.g. company numbers, and as such authoritative identifiers of legal entities); or are these identifiers of another type (a minimal list of types would need to be identified)
 * Resolution services - a list of URI bases to which the ID portion of the identifier could be appended to fetch data about this organisation.

Providing a resolution service standard
Anyone should be able to declare a service to resolve identifiers in a particular namespace.

For example, Open Corporates may declare that it will resolve any identifiers for companies namespaces, and provide a base URI of http://opencorporates.com/companies/id/.

An application that has the ID GB-COH-06368740 could then look up in the authority list that Open Corporates provides a resolver, and could append the ID to the opencorporates.com base URI to fetch back data on the organisation in question.

Resolution services should return a standard set of data, including, wherever possible, details of related organisations.

The resolution standard should include provision of an ‘at_time’ parameter, so that if a resolution service is able to provide data for past periods this can be requested. For example a consuming application may have data recording a transaction with a company in 2005. If they request data on that company from a resolution service, with an ?at_time=2005-01-01 (for e.g.) parameter then the service should return the details of the company as of that time (if known). With access to the authority list, and an existing organisational identifier from one of the namespaces registered in the authority list, anyone should be able to construct a standardised organisational identifier. If a resolution service is available for the namespace in question it should be possible to look up details on that organisation, which will hopefully including relationships of this identifier to other relevant identifiers.

Other points from the group included

 * The need for a governance structure for the authority list as an ongoing role, and a one-off requirement for work to agree the standards for interchange of data.
 * Considering / rather than - as the separator & providing standard to escape the separator There was no conclusive view on this. Whichever is adopted, some method for escaping - or / in an identifier is required. (e.g. // = /).

Common terms
The common terms group worked on a preliminary typology of relationships between organisations that could provide the basis for some standard sets of information that resolution services should attempt to provide when they can. Preliminary Typology of Relations Note: "supplier to", "donates to" indicating there are multiple transactions, either or not available as separate facts. Temporal relationships also need to be captured. SPLIT INTO A split into B. C, ... (A ceases to exist, and B, C, ... start to exist)
 * "Persistent relations"
 * Organsational
 * is member of (association/group/cabal)
 * is affiliated to
 * is organisational unit (department, etc) of
 * is shareholder of
 * is owner of (special case of above? wholly owner of
 * "Contractual"
 * has contract with
 * owes money to (long-term debt)
 * is supplier to
 * licenses to
 * takes legal action against
 * donates to
 * "Temporal relations"
 * Split into
 * Spin-off off
 * Merger
 * Acquisition

[A--] [B] [C] [...]

A created spin-off B (A continues to exist, B starts to exist) [A-] [B]

A, B, ... merged into C (A, B, ... cease to exist, C starts to exist) [A--] [B--] [...]        [C]

A acquires B (and moves its assets into A, B ceases to exist [A] [B--]

Identifying Public Bodies
Schemes like the OECD DAC Code List only include a small selection of public bodies, and tend to only include public bodies involved in Aid Donation. There are few definitive national lists of public bodies. Finding a clear definition of what constitutes a public body is also complicated: A public body could be defined as:
 * A body that defined as a public body by law - although the legal definition of many public bodies is scattered across legal instruments and no clear lists exist in most jurisdictions. It the becomes important to look for other sources of relevant lists. An institution in receipt of public budgets - but this may include private-public partnerships etc.
 * An institution subject to Freedom of Information laws - this allows a public bodies list to draw on work done by Freedom of Information portals such as WhatDoTheyKnow.com, but (a) is limited to companies with active FOI laws and campaigns who have compiled relevant lists; and (b) may not cover all relevant public bodies depending on the scope of particular FOI laws.
 * Has a government website
 * The subject of a COFOG classification - COFOG is the UN standard for Classifications of the Functions of Government.

Public bodies are also liable to change over time: as departments are merged, renamed and restructured, and administrative boundaries reshaped. Identifying when public bodies should get new IDs, or when their old IDs should be retained requires careful attention. Two possible proposals have been put forward for developing identifier sets for public bodies:
 * Using COFOG to pick out functions of government at particular levels of administrative geography'. For example, the code GB-COFOG-NN could be used to pick out the development function of the national UK government (COFOG code NN), which could be resolved to a particular departmental identifier if one were available.  GB-OXF-COFOG-NN could be used to pick out the education department of Oxford County Council (where OXF is a code for Oxford County).
 * Providing identifiers at publicbodies.org Building a list of public-bodies on a country-by-country basis drawing on the best available lists in any country.

Neither of these proposals fully resolve the problem of identifying public bodies and further work may be needed on this.

Next Steps
There are a number of next-steps from the workshop:
 * Continue collaboration and dialogue to create a draft Organisational ID standard and key terms for data exchange - This could take place jointly between the OKFN ‘Open Companies’ and ‘Open Development’ working groups.
 * Consultation with the IATI Secretariat and Technical Advisory Group on feasibility to adjustments to the IATI standard is required'.
 * A timetable for a draft should be set.
 * Creating working demonstrations of an authority list and resolution services
 * Using minimal technologies such as Google Documents to create an initial authority list which could be consumed as CSV or XMl
 * Develop demonstration of resolving organisational IDs via a resolution service - Resolution services could/should build on the Google Refine API This task needs to be adopted by someone.
 * Draft Proposal and Terms of Reference for an Organisational Identifiers Governance Group
 * Create a circulate proposal for a governance group to oversee the standard and maintain the authority list.
 * Invite key parters to participate in establishing the group.