Projects/Distributed Storage/Research

= Distributed Storage: Research =

= Requirements Research =

Other items (from couchdb wiki):


 * The system will shard the data by key, and direct it to the correct server (shard), such that the writes of the system will scale. That is that there are many writers, in a collision-free update environment.
 * Reads may scale if they outnumber the writes using some form of replication for read-only-clients.
 * If a master data store node is lost, then the client (or some proxy mechanism) can switch over to a new master data store, which is really up to date (ie. milliseconds), and the client will continue without a hitch.

= Research on Existing Tools =

Main contenders at the present are:


 * allmydata-tahoe: python
 * wide-area distributed storage
 * reasonably mature
 * we have tested it out -- see ../Plan
 * iRods? -- setup seems to be rather non-trivial (no debian package or the like AFAICT)

= Existing Tools =

Riak

 * More a distributed key-value/db store (see SoftwareTools/Databases
 * However designed on standard DHT principles and seems like it could scale for data store stuff
 * Riak can store blobs but no auto-sharding and recommended to under 50MB

Mongo

 * More a distributed key-value/db store (see SoftwareTools/Databases
 * However designed on standard DHT principles and seems like it could scale for data store stuff
 * For discussion of how concretely to do this see: http://lists.okfn.org/pipermail/okfn-help/2010-June/000668.html

iRods

 * Open Source Data Grid, Helping People Organize and Manage Large Collections of Distributed Digital Data

Tranche

 * Java
 * Clients, data servers and routing servers
 * Built in support for licensing and versioning

Dynomite (Deprecated)

 * 2010-03-30: deprecated in favour of riak or cassandra
 * A clone of the amazon dynamo key value store written in Erlang.
 * Started in 2008 (it looks like) and commits as of mid-2009
 * Seems functional but still fairly rudimentary (Dec 2009)

Lustre

 * Sun provided offering but seems cluster oriented
 * Active as of Autumn 2009

Cassandra
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Questions:
 * Not clear how "distributed" it is. Appears to be data-centre oriented.

Couch-DB

 * Couch-DB: erlang
 * NOT ACTUALLY DISTRIBUTED. Despite claims on its website to be "distributed" does not seem that it is "distributed" in standard meaning of the term -- i.e. does not distribute (seemlessly) data over multiple nodes with built in replication between nodes for robustness. (Sure you can do your sharding by hand that is true of anything ...)
 * schemaless json storage (document oriented)
 * processing (map/reduce)
 * good python libraries

My hat is off to your atstue command over this topicbravo!

Celeste

 * Celeste is a highly-available, ad hoc, distributed, peer-to-peer data store. The system implements semantics for data creation, deletion, arbitrary read and write in a strict-consistency data model.
 * In Java
 * Launched 2008

GlusterFS
GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA). Download the latest GlusterFS release from here.
 * Checked: 2009-04-20

ODS Briefacse

 * A WebDAV-based Unified Storage Solution that incorporates automated extraction and management of metadata

Tahoe

 * tahoe
 * Welcome to allmydata.org "Tahoe": a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source license. This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious.

Osprey

 * Osprey is a peer-to-peer enabled content distribution system. A metadata management system for software and document collections enables local and distributed searching of materials. Items are available for download directly via URL or indirectly via the BitTorrent peer-to-peer protocol.
 * 2009-07: does not seem to have been under active development for a while

Farsite

 * Seems to have shut down prior to 2007 as had a published then
 * No associated open source code but seems close to what we are looking for (from retrospective eval):
 * The Farsite file system is a storage service that runs on the desktop computers of a large organization and provides the semantics of a central NTFS file server. The motivation behind the Farsite project was to harness the unused storage and network resources of desktop computers to provide a service that is reliable, available, and secure despite the fact that it runs on machines that are unreliable, often unavailable, and of limited security. A main premise of the project has been that building a scalable system requires more than scalable algorithms: To be scalable in a practical sense, a distributed system targeting 105 nodes must tolerate a significant (and never-zero) rate of machine failure, a small number of malicious participants, and a substantial number of opportunistic participants. It also must automatically adapt to the arrival and departure of machines and changes in machine availability, and it must be able to autonomically repartition its data and metadata as necessary to balance load and alleviate hotspots. We describe the history of the project, including its multiple versions of major system components, the unique programming style and software-engineering environment we created to facilitate development, our distributed debugging framework, and our experiences with formal system specification. We also report on the lessons we learned during this development.

Dynamo

 * Amazon's system - not open-source

Pastry

 * Originally developed at MS research
 * Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.
 * Also has useful set of links to related projects
 * Software implementation (java)
 * Last updated: march 2009 (v2.1)

Oceanstore
DHT overlay network (?) - development on Chimera has ceased and been replaced by Chimera. Last update of Chimera seems to be around early 2008.
 * Providing Global-Scale Persistent Data
 * Download files from: oceanstore.sourceforge.net (prototype is called POND)
 * Not updated since 2003/2004 and no active installations AFAICT
 * Also gave rise to:
 * Bamboo - "A Robust, Open-Source DHT"

OpenAFS

 * Active as of Dec 2009 and has a long development history
 * AFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for federated file sharing and replicated read-only content distribution, providing location independence, scalability, security, and transparent migration capabilities. AFS is available for a broad range of heterogeneous systems including UNIX, Linux, MacOS X, and Microsoft Windows

mogilefs

 * Seems active as of Dec 2009 having moved to Google code
 * Used in production (created by livejournal and used by them)
 * Installation: application level
 * Written in PERL

More

 * Google File System:
 * proprietary, non-POSIX compliant and oriented for Google's particular production environment
 * See also Google's Big Table stuff
 * Jetfile
 * old: last update seems to be around 1999
 * multicast distributed file system
 * Technically, Nodezilla is a secured, distributed and fault tolerant routing system (aka Grid Network). Its main purpose is to serve as a link for distributed services built on top of it (like chat, efficient video multicasting streaming, File Sharing, secured file store ...). Nodezilla provides cache features; any server may create a local replica of any data object. These local replicas provide faster access and robustness to network partitions. They also reduce network congestion by localizing access traffic. It is assumed that any server in the infrastructure may crash, leak information, or become compromised, therefore in order to ensure data protection, redundancy and cryptographic techniques are used.
 * PVFS brings state-of-the-art parallel I/O concepts to production parallel systems. It is designed to scale to petabytes of storage and provide access rates at 100s of GB/s.
 * Active development (as of Dec 2009)

Im so glad that the internet alwlos free info like this!