Database daddy goes non-relational on NoSQL fanbois
Postgres and Ingres father Michael Stonebraker is answering NoSQL with a variant of his relational baby for web-scale data — and it breaks some of the rules he helped pioneer.
On Tuesday, Stonebraker’s VoltDB company is due to release its eponymous open-source OLTP in-memory database. It ditches just enough DBMS staples to be faster than NoSQL while staying on the right side of the critical ACID database compliance benchmark for atomicity, consistency, isolation and durability of data.
VoltDB claimed that the database is between five and ten times faster than NoSQL’s Cassandra on a Dell PowerEdge R610 cluster based on Intel’s Xeon 5550, and it said that VoltDB is 45 times faster than an Oracle relational database on the same hardware, with near-linear scaling on a 12-node cluster.
VoltDB is the fruit of the H-Store-project, a collaboration between MIT (Stonebraker’s academic home), Brown University, Yale University and Hewlett-Packard Labs that sought to build a next-generation transaction processing engine. An early H-Store prototype could out-perform a commercial DBMS by a factor of 80 on OLTP workloads, it was claimed.
VoltDB co-founder and chief technology officer Stonebraker ran afoul of web data movement while brewing H-Store, when he compared Google’s MapReduce to RDBMS and dared to say it was lacking.
He and computer science professor David DeWitt were pilloried by Google fanbois for “not getting” data in the cloud, after the two of them said MapReduce ignored many of the developments in parallel DBMS technology over the last 25 years.
It now seems that Stonebraker — the main architect on Ingres and Postgres and an adjunct professor of computer science at MIT — has answered the fans by reaching for the cloud without offering yet another NoSQL variant to an already crowded field. VoltDB is currently under test with 200 beta customers, and it has one paying user.
Like NoSQLers, VoltDB has bumped its speed by moving data into memory and off disc, and dumping features such as logging, locking, latching and buffer management that hinder performance and scalability in big deployments. VoltDB also adds distributed data partitioning for speed, with performance further helped by multi-core processors on chips and the additional memory in today’s servers.
To retain ACID compliance, VoltDB uses single-threaded partitions that run autonomously while the data itself is replicated in a cluster for high availability.
VoltDB vice president of marketing Andy Ellicott told The Reg: “The challenge is to remove all the overhead, but — unlike the NoSQL key value store guys — keep ACID properties, to make sure the database maintains integrity of data automatically and make sure the database can be accessed by the SQL.
“We will work between partitions in an ACID way — that separates us from the sharders, as opposed to writing roll-back logic. It’s true ACID.”
While VoltDB has followed the NoSQL key-value crowd by ditching DBMS features, it has held on to the relational data model. It has also hidden the underlying complexity of VoltDB that programmers would have encountered by writing their software to different clusters. The problem with other projects is that by dumping the relational model, they're forcing architects to find new ways of building their applications.
Programming to VoltDB is done through Java stored procedures, while VoltDB routes queries to different servers without the programmer’s application needing to know the topology of the cluster.
VoltDB is available as a community edition (free under a GPL license) and a commercial license, which costs $15,000 for a four-server cluster on an annual subscription. That price buys you roadmap updates, patches, and services. Each additional server is priced at $3,000. Updates to VoltDB are planned within the next year to let you automatically add clusters and repartitioning. ®
Logging in VoltDB
An in-memory undo log is kept for uncommitted transactions to allow rollback. This is the only purpose the log serves. HA is achieved through replication. The user can configure the replication factor (how many replicas of each partition to maintain).
Thank you for the comment, BlueGreen.
Interesting but you can get some way there already
I'd like the time to look over the papers propely but not going to happen.
Browsing the first from the link ('The End of an Architectural Era (It's Time for a Complete Rewrite)') they assume a very restricted set of transactions ("HStore requires the complete workload to be specified in advance, consisting of a collection of transaction classes") and you can effectively disable read locking (& probably latching too if you run in single-user mode, if db supports that) anyway. Given such constraints I bet I could make significant speedups to stuff I've worked on (assuming they needed it, which is another matter).
Don't understand how they can omit logging though ("One must continue to have an undo log, in case a transaction fails and needs to roll back. However, the undo log does not have to persist beyond the completion of the transaction. As such, it can be a main memory data structure that is discarded on transaction commit." eh? It maintains consistency from other sites but what if they're down?)
Still, interesting work.
good try to fix non-problem?
I would be surprised if speed of oracle is usually the problem. Most of our oracle apps are upgrades of slow but successful access solutions, which don't get a redesign*, and only get tuned for performance if they drag after going live.
* we used to have a database designer, but he got outsourced