Projects/Where Does My Money Go/Status
From Open Knowledge Foundation
This page is intended for new developers.
Last updated 2010-06-30.
Data sets
- PESA - A report pulished by HM Treasury. We used this for our prototype, and data from it is still used for the "Long-term trends" view in the dashboard.
- CRA - The Country Regional Analysis, published by HM Treasury. The dashboard is mainly based on this data.
- GLA - Greater London Authority expenditure over £1000. We use this as a second example for the data store.
- COINS - the HM Treasury database on which the CRA and PESA reports are mostly based. This data set is very large. Our medium-term ambition is to publish it in a comprehensible form.
We also have supporting data sets, which define the hierarchical coding systems used to classify spending. For more information, see our data page.
Project components
We maintain a web site. This republishes selected blog posts from the OKFN blog, and also includes its own static and dynamic content, linked from the right-hand panel. The web-site is maintained using Wordpress.
The most interesting part of the site is the dashboard, which is written in Flash, and provides informative and appealing visualisations of the CRA data set. We hope eventually to present other data sets in a similar form. The dashboard gets its data from the data store, via a REST interface and via an aggregator interface, both of which talk JSON and JSONP. The store also provides a simple web user interface, including a user interface for the aggregator, and basic search functionality.
We occasionally run topical sub-projects, such as Where are the cuts? and a stop-gap COINS browser. These peripheral projects are usually rapidly developed, and quite simple, but they often end up contributing code and ideas to the main project.
We make use of the OKFN wiki. In addition, we have an active mailing list with lots of subscribers, and a ticketing system which we use to keep track of the project tasks. These are linked from the developers' page, along with other useful resources.
We use an Etherpad for real-time collaboration during meetings where not all parties are in the same room; this also serves to minute the meetings. However, we generally try to record everything important somewhere else too, such as the mailing list, the wiki, or the ticketing system.
We keep our code in a Mercurial repository. To get a copy of the repository, type the following command:
hg clone https://username:password@knowledgeforge.net/okfn/wdmmg
Main elements of the site, as drawn by RGRP on 29 July 2010
Elements|of the WDMMG site, as drawn by RGRP on 29 July 2010|width=500
Data model
All the data sets (which we call "Slices") take the form of an OLAP cube. We call the axes of the cube "Keys" (the standard OLAP terminology is "Dimensions"). Examples of Keys incude time, place, department, function, and so on. We call the ticks along those axes "EnumerationValues" (the standard OLAP terminology is "Members"). Examples of EnumerationValues include 2008-2009, SCOTLAND, Department for Work and Pensions, Health, and so on. The cells in the cube are filled with Transactions. In each Slice, the Transactions are non-overlapping and ideally exhaustive. The principal function of the data store is to organise the Keys and EnumerationValues, and to aggregate the Transactions in such as way as to hide irrelevant axes.
The data model is documented in the Mercurial repository, at "doc/data_model.dia". It is currently implemented using an SQL database.
Technologies
We use Python as our programming language. We use the Pylons web framework. We use Genshi for templating. We use SQLAlchemy to access the database. We use Solr for fast searching.
We try to use in-house OKFN infrastructure where possible, not least to force ourselves to maintain and improve that infrastructure. We host published data sets on ckan.net. We use a tool called "datapkg" to download them. We also use a "Swiss army knife" package called "swiss". These tools are far less mature than the third-party tools, but that is counter-balanced by having the author on call (and the fact these tools provide something that no third-party toolset provides!).
Functionality of the data store
We have written loaders for all the data sets into our chosen data model. These are found in the "wdmmg/wdmmg/getdata" Python package. We try to factor out common functionality. For example, there is a generic loader in the "wdmmg/wdmmg/lib/loader" python package.
We maintain a web interface for humans browsing the data in the store.
We maintain a REST interface to allow third-party applications to browse the store in a machine-readable form. The only such third-party application so far is the dashboard.
We maintain an aggregator interface to allow third-party applications to query the store in a high-level way. The aggregator provides two main services: it filters out irrelevant Transactions, and it sums over irrelevant Keys. The aggregator currently does the summing in real time, and uses a cache to achieve acceptable performance. This approach will not scale up to the COINS data set, and we plan to pre-compute aggregates in future.
We are in the process of implementing a search interface. The search results are aggregates.
Data sources for search, as drawn by RGRP on 29 July 2010
Data|sources for search, as drawn by RGRP on 29 July 2010|width=500
OFKN service-oriented architecture, as drawn by RGRP on 29 July 2010
OFKN|service-oriented architecture, as drawn by RGRP on 29 July 2010|width=500
Read-only vs Editable
All data is currently read-only, except by running command-line tools on the server.
We would like to open up some parts of the data, so that they can be edited over the web. We have not yet decided which parts of the data model to open up. Although it is conceptually difficult, there are especially many use-cases for opening up the Keys and EnumerationValues; in particular we would like to let people classify spending however they want to, and to correct errors in other people's classifications.
Making the data editable requires implementing some sort of version control, to cope with spam and vandalism, to provide an audit trail, and to prevent accidental mishaps. We currently think the best route to a version-controlled store is to replace the SQL database with an RDF-based store. Other OKF projects, especially OpenBiblio, have developed tools for this purpose. We have already attempted this transition once, but it did not go well. We intend to try again when the tools are more mature.
We expect some sort of system of user accounts and permissions will also be needed.
Although editing functionality is highly desirable, it is a long-term goal. It has not made it onto the plans for phases 1 and 2.
Future plans for the site, as drawn by RGRP on 29 July 2010
Future|plans for the WDMMG site, as drawn by RGRP on 29 July 2010|width=500
Project phases
For the purposes of funding, and to force us to launch regularly, the project is managed in phases. Between launches, we adopt a flexible development model, in which the schedule waits for the work. This is especially useful for design work. As we approach the launch at the end of a phase, we adopt a more rigid scope, plan and timetable, and we try to close off all loose ends. Each phase lasts about eight weeks.
Phase 1 ended on 10th May 2010, or thereabouts. At the time of writing (30th June 2010) planning for phase 2 is nearly complete and all backend work for phase 2 is done. We will soon be fixing the scope and estimating the timetable for the next launch.