On this page, we describe our benchmark setup in detail so that others may reproduce and validate our results.
$ git clone git://github.com/brianfrankcooper/YCSB.git $ cd YCSB $ git checkout 7b564cc3 $ mvn clean package
Both Cassandra and MongoDB required extensive manual effort to set up an optimized cluster. This section describes the additional effort we spent in order to tune these systems:
The latest Cassandra provides an automatic bootstrap mechanism, which is the easiest way to start up a cluster. But in our initial experiments (not reported here), we found that this mechanism often led to enormous load imbalances between the nodes, and therefore led to significantly diminished performance. In our initial tests, we saw some nodes acquire as many as as 60% of the key space while others owned less than 1%.
To avoid the problems we encountered in Cassandra's automatic
partitioning, we manually partitioned the Cassandra ring to ensure that every
node was responsible for an equal portion of the key space. To do this, set
initial_token setting in each node's configuration file such
that each node owns 8.33% of the ring.
We suggest that Cassandra users who are relying on the automatic
partitioning mechanism check their resulting ring structure and adjust their
rings as well for optimum performance. The
changed between Cassandra 1.1 and Cassandra 1.2 so you'll want to consult the
latest Cassandra documentation.
For the benchmark, the cluster contains a single keyspace called
usertable using SimpleStrategy and a replication factor of two.
Inside this keyspace, we create a columnfamily called
_idof the objects as that is what YCSB uses as the key.
Allocate extra time for deploying the MongoDB cluster. The default configuration of the Linux x86_64 binaries for version 2.2.2 will allocate 5% of the total disk space for the replication log. On our cluster, this is approximately 20GB per host.
HyperDex was compiled from Git commit 186ad8c3 using the latest versions of its dependencies at the time. LevelDB was built with Snappy support, and both were compiled from the most recent source tarballs.
HyperDex took the least amount of configuration and follows directly from the HyperDex quickstart guide. We deployed one HyperDex coordinator and twelve HyperDex daemons, each on a separate node. The exact commands to launch hyperdex were:
machine01 $ hyperdex coordinator machine?? $ hyperdex daemon -d -c machine01 -D /local/hyperdex
There was no manual configuration of the HyperDex cluster. We relied solely on the automatic partitioning and setup of the default system. Workloads A-D,F use the following space
space usertable key k attributes field0, field1, field2, field3, field4, field5, field6, field7, field8, field9 create 24 partitions tolerate 1 failure
Workloads E use the following space and has "hyperclient.scannable=true" set for YCSB.
space usertable key k attributes int recno, field0, field1, field2, field3, field4, field5, field6, field7, field8, field9 subspace recno create 24 partitions tolerate 1 failure