On this page, we provide detailed data from the industry-standard YCSB benchmarks executed on HyperDex, Cassandra and MongoDB to provide insights into how HyperDex achieves its high performance. If you land on this page from some other website or search engine, please make sure to check out the basic performance benchmarks for the overall picture.
Recall that the Session Store benchmark specifies an update-heavy workload consisting of 50% read operations and 50% write operations. The CDF graph sheds some light on the distribution of latencies of individual operations.
The chart on the right shows the number of operations, expressed as a percentage of the total number of operations, that complete in less than a particular number of milliseconds. For example, the graph shows that 99% of HyperDex read operations complete in less than one millisecond. In general, the closer to the top-left corner a curve is on these CDFs, the lower and more consistent the system's latency is.
All three systems complete 99% of operations in just a few milliseconds. The scale of the chart makes it difficult to see the differences in performance for Cassandra and HyperDex on this chart. The chart to the left provides a closeup on the slowest 6% of operations. We can see that more of HyperDex's read and write operations complete in less than 10ms than either of the other systems. Latency directly affects throughput available to clients and so by completing operations more quickly, HyperDex is able to complete more operations in the same amount of time. Although Cassandra completes nearly 100% of write operations almost immediately because the YCSB benchmark configures Cassandra to provide weak consistency by default, Cassandra incurs high latencies for reads, limiting its overall throughput.
YCSB Photo Tags benchmark is a read-mostly workload consisting of 95% read operations and 5% write operations. The graph to the left shows a CDF of read and update operations' latencies for this benchmark, while the graph to the right zooms in on the slowest 6% of operations. Once again, HyperDex provides high throughput by completing more operations in under 10ms than either Cassandra or MongoDB.
The graph to the left shows a CDF of read and update operations' latencies for this benchmark, while the graph to the right zooms in on the slowest 6% of operations. These detailed measurements show that HyperDex is consistently fast, while Mongo takes more than 1 ms for more than 30% the operations.
This benchmark poses a workload that inserts new objects into the NoSQL store 5% of the time and reads objects the remaining 95% of the time, where reads prefer to retrieve items that were recently inserted. The operation latency CDFs on the right very closely resemble those of Photo Tags. If any of the systems had special optimizations that favor, or penalize, recently inserted objects, their results on this benchmark would likely not parallel the results on the Photo Tags benchmark. We can see from this graph that, instead, all systems' results mimic their results from the Photo Tags benchmark. HyperDex has no special optimizations for any particular case; it is just fast overall.
The Threaded Conversations benchmark is heavily skewed towards Cassandra and MongoDB, in that what is naturally a retrieval by secondary attributes is instead implemented as a scan through the primary key space. Objects' keys are prefixed with the thread ID, obviating the need to retrieve objects by secondary attribute. In reality, changing an application so it performs retrieval by a primary key instead of a more natural retrieval by secondary attributes is not always possible. Nevertheless, we measure Cassandra and MongoDB with retrievals by primary key, playing to their strong suit.
In contrast, for HyperDex, we set the database up such that the retrieval is implemented as a search over a secondary attribute. The ability to search and retrieve not just by the primary key but any attribute is one of the critical design goals of HyperDex, and is substantially harder than retrieval by the primary key.
Nevertheless, HyperDex substantially outperforms both Cassandra and MongoDB. It offers 1.8 times higher throughput than Cassandra. We are unable to provide a comparison to MongoDB because MongoDB was unable to finish this benchmark in the allotted time.
The User Database benchmark presents applications with a read-modify-write workload where each read-modify-write operation fetches the object stored under a particular key and then immediately stores a modified version of the object under the same key. The CDF to the right shows the latency of the read and read-modify-write operations. Not surprisingly, the read latency is strictly less than the read-modify-write latency as the former is a strict subset of the latter.
Cassandra has issues with this workload that are not exposed in the other benchmarks. Specifically, nearly a quarter of read-modify-write operations take longer than 1 ms, while nearly all complete within 2 ms. We were surprised at this result because the key selection follows the same distribution as in the Session Store benchmark, and Cassandra achieved sub-ms throughput for more than 97% of operations in that benchmark.
MongoDB's read-modify-write performance in this benchmark more closely matches the insertion and update performance of the Photo Tags and User Status Updates benchmarks. While MongoDB completes nearly 180% of write operations in less than 1 ms in the Session Store benchmark, it takes nearly 3 ms for 80% of read-modify-write in this benchmark. This suggests to us that MongoDB's latency for any given workload is dependent upon the ratio of reads to writes in the workload itself.
HyperDex's performance in this benchmark matches that seen in the Session Store, Photo Tags, and User Status Updates benchmarks. It consistently achieves low latency for all classes of operations.
Each of the above benchmarks was designed by the YCSB authors to mimic a particular style of web application. Real applications, however, do not present such fixed workloads and are likely to experience fluctuations in the number of reads and writes. In selecting a NoSQL data store, application architects must keep in mind the ability of the NoSQL system to adapt to changing workloads. The six CDFs below show the read and write latencies of the Session Store, Photo Tags, User Profile Cache, and User Status Updates benchmarks. Each graph shows the latency CDF of one kind of operation for the different workloads.
Intuitively, systems with predictable performance will produce consistent latency distributions across a variety of workloads. In the CDFs below, we can see that Cassandra and HyperDex's read and write latencies follow similar distributions, regardless of whether the benchmark is update heavy or read heavy. The latency of the fastest 95% of MongoDB operations is heavily dependent upon read/write ratio of the benchmark.
Using the industry-standard Yahoo!\ Cloud Serving Benchmark, we evaluated Cassandra, MongoDB and HyperDex. YCSB provides six core benchmarks which pre-date HyperDex and were developed without input from the HyperDex developers. In all six benchmarks, HyperDex achieves a higher throughput than either Cassandra or MongoDB. The latency of HyperDex's read and write operations is not influenced by the ratio of read/write operations; HyperDex is predictably fast.
But there is a lot more to selecting a NoSQL store than just performance. In particular, the functionality and the guarantees offered by a system directly affect the quality of applications one can build. On this front, HyperDex offers a combination of features and guarantees not offered by the other systems. It is worth enunciating three of these features below.
First, HyperDex offers a consistency guarantee not found in other systems. Cassandra and MongoDB offer only "eventual consistency." Eventual consistency forces the application designer to always assume that data may be out of date, and therefore limits the guarantees that an application may provide (note that applications that use an eventually consistent store as if it is not are incorrect). In contrast, HyperDex offers the strongest consistency guarantee possible in a key-value store, known as "linearizability." Every GET is guaranteed to return the latest PUT. There is no opportunity for stale data.
Second, HyperDex offers a strong fault-tolerance guarantee. No operation is considered complete until a write has been propagated to all the replicas, and the amount of replication is under the programmer's control. The YCSB benchmark runs Cassandra and MongoDB with weaker fault tolerance guarantees where data is considered safely written when it has been propagated to just one of the replicas. HyperDex achieves higher performance while performing strictly more work.
Finally, HyperDex is easy to set up, configure and use. C, C++, Java and Python bindings enable access from popular platforms. Automatic cluster configuration works out of the box, with no need for manual intervention. In contrast, MongoDB requires extensive manual configuration and attention paid to minute yet critical settings. And Cassandra suffers from problems with its automatic configuration algorithm that lead to load imbalance (these problems were manually corrected for the benchmarks shown above).