Projects/Distributed Storage
From Open Knowledge Foundation
< Projects
Contents |
Distributed Storage (Distributed File Systems)
Purpose
To provide distributed storage infrastructure to Foundation and other open knowledge projects. Consists of two stages:
Status: Incubating
This project is associated to the Foundation's Infrastructure Working Group.
Participate
- Via email: join the okfn-help list or email info [at] okfn [dot] org.
- Mercurial repo: https://knowledgeforge.net/okfn/grid
Project Members
- Rufus Pollock
- Julian Todd
- Will Waites
- James Casbon
What We Want (or What We Mean by Distributed)
There is an addressable file-space (e.g. a virtual file-system) which is distributed over multiple machines (nodes). Key features:
- Wide area: we have a preference for a wide-area system, i.e. we do not expect all the nodes to be in a single data-centre or on a single high-speed network but rather to be distributed across the Internet.
- Even a single data-centre solution would be interesting though
- Robustness: data must not be lost if a given node (or even k) nodes disappear
- This implies replication, i.e. data must be automatically replicated across nodes
- Easy addition of nodes: it should be easy for an average sysadmin to install and configure a node (e.g. debian package should be available)
- We want people to be able to easily "donate" nodes
- Share/shard-rebalancing: should have good re-balancing to handle (permanent) node entry and exit
- Different file sizes: the system should be able to handle small and very large files (so files should be automatically sharded)
- Availability: high guarantee of data availability (so the disappearance of a given node)
- Open data focused: focused on data/content that is open so encryption/privacy is not a priority
- F/OSS: must be free/open source software so we can build open services
- Eventually consistent: Concurrency/Consistency is not required as long as eventually consistent (we know our CAP)
Background
- Since we first started Knowledgeforge we have been looking for ways to store data more efficiently than putting it in svn or on disk.
- Many of the Foundation projects require or would be assisted by the availability of sizable storage capacity
- We also need tools to perform simple backup and replication (both for robustness and to assist with Data Distribution
- Furthermore these needs are common to many other projects in open knowledge community
Created: 2009-04-08
- 2006-10-29 First entries in ToolsWeNeed
- 2007-11: investigation of allmydata
- 2009-04-08: own dedicated project page
- 2009-05: prototype grid.okfn.org launches using tahoe
- 2009-07: work on permissioning and alpha version of new webapp frontend built using pylons.