Projects/Distributed Storage

= Distributed Storage (Distributed File Systems) =

= Purpose =

To provide distributed storage infrastructure to Foundation and other open knowledge projects. Consists of two stages:


 * 1) /Research -- detailed list of existing technologies
 * 2) Implementation (software + service) - see /Plan

= Status: Incubating =

This project is associated to the Foundation's Infrastructure Working Group.

= Participate =


 * Via email: join the okfn-help list or email info [at] okfn [dot] org.
 * Mercurial repo: https://knowledgeforge.net/okfn/grid

= Project Members =


 * Rufus Pollock
 * Julian Todd
 * Will Waites
 * James Casbon

= What We Want (or What We Mean by Distributed) =

There is an addressable file-space (e.g. a virtual file-system) which is distributed over multiple machines (nodes). Key features:


 * Wide area: we have a preference for a wide-area system, i.e. we do not expect all the nodes to be in a single data-centre or on a single high-speed network but rather to be distributed across the Internet.
 * Even a single data-centre solution would be interesting though
 * Robustness: data must not be lost if a given node (or even k) nodes disappear
 * This implies replication, i.e. data must be automatically replicated across nodes
 * Easy addition of nodes: it should be easy for an average sysadmin to install and configure a node (e.g. debian package should be available)
 * We want people to be able to easily "donate" nodes
 * Share/shard-rebalancing: should have good re-balancing to handle (permanent) node entry and exit
 * Different file sizes: the system should be able to handle small and very large files (so files should be automatically sharded)
 * Availability: high guarantee of data availability (so the disappearance of a given node)
 * Open data focused: focused on data/content that is open so encryption/privacy is not a priority
 * F/OSS: must be free/open source software so we can build open services
 * Eventually consistent: Concurrency/Consistency is not required as long as eventually consistent (we know our CAP)

= Background =


 * Since we first started Knowledgeforge we have been looking for ways to store data more efficiently than putting it in svn or on disk.
 * Many of the Foundation projects require or would be assisted by the availability of sizable storage capacity
 * We also need tools to perform simple backup and replication (both for robustness and to assist with Data Distribution
 * Furthermore these needs are common to many other projects in open knowledge community

= Created: 2009-04-08 =


 * 2006-10-29 First entries in ToolsWeNeed
 * 2007-11: investigation of allmydata
 * 2009-04-08: own dedicated project page
 * 2009-05: prototype grid.okfn.org launches using tahoe
 * 2009-07: work on permissioning and alpha version of new webapp frontend built using pylons.