Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Secure remote control for conventional and virtual desktops

Battle of the rows

Stonebraker reckons the relational staples such as logging, locking, latching and buffer management that have helped pioneer and maintain a crucial feature of databases - data integrity according to the atomicity, consistency, isolation and durability (ACID) principles - have also become its biggest burden. Processing alone to make these features work soaks up 90 per cent of a transaction's time in terms of CPU cycles, slowing performance and wasting power.

The serial inventor's answer to this particular problem was initially VoltDB. His database speeds things up by moving data into memory and using distributed data partitioning with multi-core processors and server memory. ACID is retained because VoltDB uses single-threaded partitions that run autonomously while data is replicated in a cluster for high availability.

VoltDB claims to be 45 times faster than an Oracle relational database on a Dell PowerEdge R610 cluster based on Intel's Xeon 5550 with near-linear scaling on a 12-node cluster. VoltDB was the product of H-Store-project, a collaboration between Stonebraker's MIT home, Brown University, Yale University and Hewlett-Packard Labs.

Before VoltDB, there was Vertica. This used a column-oriented, shared-nothing architecture with a massively parallel processing (MPP) engine and data compression to reduce storage and speed queries. Vertica claims query results between 50 and 200 times faster than databases that store data in rows. Vertica started as the C-Store project also with Brown and MIT, plus Brandeis University and University of Massachusetts, Boston.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all."

Stonebraker reckons columnar-databases are quicker than relational databases because they know what they are looking for. They don't need to waste time sorting rows.

VoltDB, Versa, and - soon - SciDB take Stonebraker into a growing tussle against NoSQL over which architecture is "right" in a fight for mindshare and for customers. SciDB is listed as a NoSQL database, here.

Facing off against SciDB, Vertica and VoltDB in a range of scenarios are Hadoop, MapReduce, Cassandra, CouchDB, Amazon's SimpleDB and Memcached - the latter being the distributed memory caching companion to MySQL used for scale and speed. Helping push them are their creators such as Google and Amazon or startups like Cloudera, mega-scale customers such as Twitter and Facebook, and an army of evangelists convinced that NoSQL is the future.

Sparks flew between Stonebraker and the NoSQL movement in 2008 when the relational expert incensed MapReduce fans in a joint blog with DeWitt for calling MapReduce a "giant step backward in the programming paradigm for large-scale data intensive applications".

Stonebraker and DeWitt professed amazement at the hype over how MapReduce represented a "paradigm shift in the development of scalable, data-intensive applications" and called MapReduce a good idea for writing "certain types" of general-purpose computations but lacking many tools and features commonly associated with DBMS that users have come to depend on.

Bloggers stormed back, damning these "so-called" database experts for "not getting" data in the cloud and - like jealous suitors jumping to their lover's defense - demanded a retraction of this "highly inaccurate article" as if it had slandered their beloved MapReduce.

Most missed the point: Stonebraker and DeWitt weren't calling MapReduce a bad database. They were picking up on the fact that MapReduce - like its open-source clone Hadoop - are being used as if they are databases, with more data being dumped in them by customers on a daily basis and with those customers then needing to transact and analyze that data. It's a problem that's been creeping into Memcached and NoSQL, with people now trying to make Memcached and NoSQL work with relational databases.

Was Stonebraker surprised by the flames?

"The NoSQL guys are people who know nothing about databases and their first reaction is to lash out, so I'm not surprised [by the reaction]," he said.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all," an unapologetic Stonebraker continued. "They produced a thing that worked for their crawling applications. MapReduce was written to support the processing pipeline behind Google."

Turning MapReduce and Hadoop into databases would take a long time and a huge rewrite to inject things like data repositories, indexes, query languages and updates.

Does he recant in the face of such a flaming? Far from it. He's as critical as ever.

"If you are over 35, you are over the hill apparently in math," he claimed. "In computer science, the grey beards like me are still viable, and it's for this reason that what goes around comes around. The young guys haven't seen it before and the problem with our computer science education system is the lessons from the past seem to get lost."

And, it would seem, Google agrees with him.

Accidental SQL supporter

Stonebraker's got little time for those who claim it's the language that's slowing down databases serving big data. Hadoop is written in Java, CouchDB in Erlang, and in-memory key-value persistent storage engine Memcached in C. For Stonebraker, the interface is the problem, not the language. Hence Volt has been rewritten to remove 90 per cent of the overhead associated with OLTP.

"I'm not a particular fan of SQL but I don't mind it. Jettisoning it just to, say, "get record" is a huge mistake."

Interestingly, Stonebroker wrote Ingres in QUEL and left SQL to Ellison. The industry, and history, swung behind SQL, helping catapult Oracle to today's number-one position while Ingres didn't switch to SQL until version six in the mid 1990s - too late to catch Oracle.

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
China hopes home-grown OS will oust Microsoft
Doesn't much like Apple or Google, either
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?