Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

The Power of One eBook: Top reasons to choose HP BladeSystem

Battle of the rows

Stonebraker reckons the relational staples such as logging, locking, latching and buffer management that have helped pioneer and maintain a crucial feature of databases - data integrity according to the atomicity, consistency, isolation and durability (ACID) principles - have also become its biggest burden. Processing alone to make these features work soaks up 90 per cent of a transaction's time in terms of CPU cycles, slowing performance and wasting power.

The serial inventor's answer to this particular problem was initially VoltDB. His database speeds things up by moving data into memory and using distributed data partitioning with multi-core processors and server memory. ACID is retained because VoltDB uses single-threaded partitions that run autonomously while data is replicated in a cluster for high availability.

VoltDB claims to be 45 times faster than an Oracle relational database on a Dell PowerEdge R610 cluster based on Intel's Xeon 5550 with near-linear scaling on a 12-node cluster. VoltDB was the product of H-Store-project, a collaboration between Stonebraker's MIT home, Brown University, Yale University and Hewlett-Packard Labs.

Before VoltDB, there was Vertica. This used a column-oriented, shared-nothing architecture with a massively parallel processing (MPP) engine and data compression to reduce storage and speed queries. Vertica claims query results between 50 and 200 times faster than databases that store data in rows. Vertica started as the C-Store project also with Brown and MIT, plus Brandeis University and University of Massachusetts, Boston.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all."

Stonebraker reckons columnar-databases are quicker than relational databases because they know what they are looking for. They don't need to waste time sorting rows.

VoltDB, Versa, and - soon - SciDB take Stonebraker into a growing tussle against NoSQL over which architecture is "right" in a fight for mindshare and for customers. SciDB is listed as a NoSQL database, here.

Facing off against SciDB, Vertica and VoltDB in a range of scenarios are Hadoop, MapReduce, Cassandra, CouchDB, Amazon's SimpleDB and Memcached - the latter being the distributed memory caching companion to MySQL used for scale and speed. Helping push them are their creators such as Google and Amazon or startups like Cloudera, mega-scale customers such as Twitter and Facebook, and an army of evangelists convinced that NoSQL is the future.

Sparks flew between Stonebraker and the NoSQL movement in 2008 when the relational expert incensed MapReduce fans in a joint blog with DeWitt for calling MapReduce a "giant step backward in the programming paradigm for large-scale data intensive applications".

Stonebraker and DeWitt professed amazement at the hype over how MapReduce represented a "paradigm shift in the development of scalable, data-intensive applications" and called MapReduce a good idea for writing "certain types" of general-purpose computations but lacking many tools and features commonly associated with DBMS that users have come to depend on.

Bloggers stormed back, damning these "so-called" database experts for "not getting" data in the cloud and - like jealous suitors jumping to their lover's defense - demanded a retraction of this "highly inaccurate article" as if it had slandered their beloved MapReduce.

Most missed the point: Stonebraker and DeWitt weren't calling MapReduce a bad database. They were picking up on the fact that MapReduce - like its open-source clone Hadoop - are being used as if they are databases, with more data being dumped in them by customers on a daily basis and with those customers then needing to transact and analyze that data. It's a problem that's been creeping into Memcached and NoSQL, with people now trying to make Memcached and NoSQL work with relational databases.

Was Stonebraker surprised by the flames?

"The NoSQL guys are people who know nothing about databases and their first reaction is to lash out, so I'm not surprised [by the reaction]," he said.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all," an unapologetic Stonebraker continued. "They produced a thing that worked for their crawling applications. MapReduce was written to support the processing pipeline behind Google."

Turning MapReduce and Hadoop into databases would take a long time and a huge rewrite to inject things like data repositories, indexes, query languages and updates.

Does he recant in the face of such a flaming? Far from it. He's as critical as ever.

"If you are over 35, you are over the hill apparently in math," he claimed. "In computer science, the grey beards like me are still viable, and it's for this reason that what goes around comes around. The young guys haven't seen it before and the problem with our computer science education system is the lessons from the past seem to get lost."

And, it would seem, Google agrees with him.

Accidental SQL supporter

Stonebraker's got little time for those who claim it's the language that's slowing down databases serving big data. Hadoop is written in Java, CouchDB in Erlang, and in-memory key-value persistent storage engine Memcached in C. For Stonebraker, the interface is the problem, not the language. Hence Volt has been rewritten to remove 90 per cent of the overhead associated with OLTP.

"I'm not a particular fan of SQL but I don't mind it. Jettisoning it just to, say, "get record" is a huge mistake."

Interestingly, Stonebroker wrote Ingres in QUEL and left SQL to Ellison. The industry, and history, swung behind SQL, helping catapult Oracle to today's number-one position while Ingres didn't switch to SQL until version six in the mid 1990s - too late to catch Oracle.

Boost IT visibility and business value

More from The Register

next story
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
Chrome browser has been DRAINING PC batteries for YEARS
Google is only now fixing ancient, energy-sapping bug
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.