Software Tools/Databases

From Open Knowledge Foundation

Jump to: navigation, search

Databases

Review of available database technologies construed in a broad sense (e.g. not just RDBMS).

Diagram of NoSQL systems: http://blog.nahurst.com/visual-guide-to-nosql-systems

Plain Ol' Filesystem

NOSQL (Key/Value Stores)

Read the spreadsheet for full overview:

As of early 2010 codebases I've heard best things about are:

misc unformated notes on nosql stores

mongo has a 2 or 2.5Gb limit on 32bit hardware since it keeps the database mmap'd. otoh it supports deep indexing of documents which is crucial for storing rdf/json serialisation in the database.

cassandra: rigid schema, kv store, not document oriented. means it's hard to follow the same pattern of putting RDF since the value (object part of statements) has to be a list, ideally of dictionaries. could store the objects serialised, but then no indexes so no good.

couchdb and riak could do the same as with mongo, they don't have indexes in the same way. any searching is with map/reduce. without an index this means touching all objects in the database. probably performs badly.

4store on 32bit hardware has a limit as well, since it uses mmap heavily. generally if the ptree files grow too large it will blow up. the limit in practice will be higher than mongo's but still there.

could work around the mongo limit by running multiple instances of the back-end and using sharding. usually this is done on a cluster but it would work, I think, on a single host. each shard would have the 2Gb limit. not sure how well it would work.

RDF

RDFLib

4store

Sesame

Associated Tools

OLAP

Cubulus

pentaho

Data Processing (ETL)

ETL = Extract, Transform, Load

snaplogic

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox