General

What makes HyperDex different from all the other NoSQL systems out there?

HyperDex is unique in the NoSQL space because it offers ACID transactions, a rich API, strong consistency guarantees and fault tolerance.

This combination of properties stems from three technical breakthroughs:

  • HyperSpace Hashing: A new hashing technique that preserves data locality even as it enables data distribution within a cluster.

  • Value-Dependent Chaining: A new data replication technique that ensures that all operations are strictly ordered and reliably committed at all replicas before a positive response is returned to clients.

  • Linear Transactions: A new transaction management technique that enables NoSQL style data distribution and sharding to be combined with ACID guarantees.

Transactions

Is HyperDex truly transactional?

Yes. HyperDex supports "multi-key transactions," that is, transactions involving multiple objects. So a HyperDex client can begin a transaction, read and modify any number of objects in any order, then commit the transaction. The commit will succeed or abort as one atomic unit, just like a traditional centralized RDBMS system.

This transactional guarantee is also coupled with an isolation guarantee. While within the transaction, the process will be isolated from changes concurrently being applied by other transactions.

How does HyperDex differ from traditional databases (RDBMSs)? If I'm used to Oracle, MySQL or Postgres, what will look different?

HyperDex is a modern NoSQL store. It keeps all data sharded across a collection of machines, and uses novel techniques to coordinate the data on this cluster to provide its features. It differs from RDBMSs in its interface and architecture.

Interface

RDBMSs traditionally provide a declarative interface -- their clients express their queries in SQL, and the database is tasked with coming up with the optimal query plan for executing each query. To compute such plans effectively, RDBMS need to maintain many additional datastructures and entail great complexity, with mixed results and limited performance.

HyperDex provides an imperative API -- HyperDex clients directly store and retrieve objects through direct calls whose costs are predictable and well-characterized.

Architecture

Historically, RDBMSs imposed a centralized organization on data, where all tables were located on a single node. While recent databases have tried to diverge from this view, the nature of their SQL interface makes it difficult for RDBMS to scale. Further, RDBMSs employ heavy-weight mechanisms for locking and for distributed transaction management that can limit performance.

HyperDex relies on a distributed, sharded architecture. Since data resides on a collection servers, independent operations can execute in parallel, aiding scalability. Novel algorithms for placing data in a cluster while retaining its locality (hyperspace hashing), for replicating it in a consistent manner (value-dependent chaining), and for managing operations that span multiple keys (linear transactions) enable it to provide the same ACID properties as RDBMSs.

Transactions

Does HyperDex really offer ACID transactions of the kind that I am used to from RDBMSs?

Yes, the following code illustrates how one would implement the canonical account transfer example:

t = hyperdex.begin_transaction()
bal1 = t.get("accounts", "tim")["balance"]
bal2 = t.get("accounts", "joe")["balance"]
bal1 -= 100
bal2 += 100
t.put("accounts", "tim", {"balance": bal1})
t.put("accounts", "joe", {"balance": bal2})
t.commit()

This example illustrates how a single client can modify two accounts at the same time. The updates to tim and joe's accounts will either take place together, atomically, or not take place at all.

There are other NoSQL stores that claim to have ACID transactions. How is HyperDex different?

They don't support transactions involving more than a single object. A sequence of atomic operations is not the same thing as a proper transaction. HyperDex also supports atomic operations, in addition to transactions. The following (legal, supported) HyperDex code, for instance, will not have the same effect as the transaction example above:

hyperdex.atomic_sub("accounts", "tim", {"balance": 100})
hyperdex.atomic_add("accounts", "joe", {"balance": 100})

These two operations, while they are individually atomic, will not execute together as a single, indivisible, atomic unit. There may be a time window where the net sum of money in the system is incorrect. And other operations may be interleaved in between such individually atomic, but non-transactional, sequences of operations.

Scalability

Having a search contact a small number of nodes in the system seems to be against the idea of "big-data". Shouldn't all nodes in the system be put to work to make searches faster?

Perhaps not surprisingly, using all nodes does not scale as the cluster grows larger.

Imagine a hypothetical system with N nodes and C clients performing R requests per second. If each search touches all N nodes (as many nodes as possible), then the system is performing NCR network round trips per second. If the system were to scale to 2C and 2N, you have 4NCR network round trips per second. Such an approach may work well for small clusters, but the O(n^2) nature will make scaling to large clusters prohibitively expensive as network bandwidth quickly becomes the bottleneck.

Instead imagine that each client's search contacts a constant number of nodes, where the constant is proportional to the number of results desired. Now the number of round trips per second is linear in the number of clients, and constant in the number of servers. As a result, clusters can scale to be rather large while providing clients with lower latency and higher throughput than would be possible if every node were involved in every search.

How well does HyperDex scale when objects have many attributes?

HyperDex uses a technique called subspace partitioning to create many lower-dimensional spaces when objects have many attributes. This trick enables us to serve large spaces with relatively small clusters.

Consistency

What consistency guarantees does HyperDex provide?

HyperDex provides one-copy serializability for transactions and linearizability for key-based operations, consistency guarantees that are ordinarily only found in traditional RDBMS systems.

This is in stark contrast with first-generation NoSQL data stores, which provide "eventual consistency."

HyperDex ensures that every operation will be seen in the same order by every client connected to the system. Not "eventually" or "sometimes," but immediately and always. Once a client receives an acknowledgment for a PUT operation, every subsequent GET operation will see that object. Further, concurrent PUT operations will always be observed in the same order at all nodes. Same applies to committed transactions.

Searches will always see the result of operations which have returned to clients. Operations which are concurrent with a search may or may not have an effect on the result.

Fault-Tolerance

Computers fail. Will HyperDex still work?

Yes. HyperDex is designed to withstand a threshold of failures desired by the application. The level of fault-tolerance is tunable by the system administrator. HyperDex guarantees consistency, availability in the presence of less than f faults, and partition tolerance for partitions that affect less than f nodes, where f is a user-tunable parameter.

Benchmarks

Why is HyperDex so much faster than other systems?

HyperDex's speed advantage stems from its use of hyperspace hashing coupled with careful implementation techniques. For straightforward GET/PUT operations, HyperDex's performance advantage comes largely from its streamlined implementation. SEARCH operations, on the other hand, are faster because hyperspace hashing is superior to traditional techniques which build distributed indices. And the speed of its transactional facilities is due to its innovative linear transactions technique.

Platforms

On which platforms does HyperDex run?

HyperDex servers run on 64 bit Linux servers. We provide binary packages for Ubuntu, Debian and Fedora, and source for other systems.

Do you have support for Windows clients?

Yes. We have a port of our client library that works on Windows 7. Contact us for details.

Do you have support for 32-bit platforms?

Yes. We have a user-contributed port of our client library that works on 32-bit platforms. Contact us for details.

Licensing

What are the licensing terms for HyperDex?

HyperDex is free and open source, available under the BSD license. The source for the entire core, which includes everything except the transactions, is available for free.

What are the licensing terms for HyperDex Warp?

The Warp technology, which supports the transactional features, is a proprietary add-on. We provide an evaluation copy for free. This copy has no fault-tolerance features, so it should not be deployed in production settings. Please contact us for commercial licenses to the Warp technology.

Support

Is there commercial support available for HyperDex.

Yes. We offer competitive support contracts for the HyperDex data store. Contact United Networks, LLC at:

Contact

I am interested in discussing HyperDex, potential API additions, use cases, etc. publicly with other users and developers, what should I do?

We have a Google Group called hyperdex-discuss for this purpose.

I have suggestions, ideas, or other thoughts I would like to share privately with the developers.

Please get in touch with the dev team at support@hyperdex.org.