Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Combat fraud and increase customer satisfaction

Battle of the rows

Stonebraker reckons the relational staples such as logging, locking, latching and buffer management that have helped pioneer and maintain a crucial feature of databases - data integrity according to the atomicity, consistency, isolation and durability (ACID) principles - have also become its biggest burden. Processing alone to make these features work soaks up 90 per cent of a transaction's time in terms of CPU cycles, slowing performance and wasting power.

The serial inventor's answer to this particular problem was initially VoltDB. His database speeds things up by moving data into memory and using distributed data partitioning with multi-core processors and server memory. ACID is retained because VoltDB uses single-threaded partitions that run autonomously while data is replicated in a cluster for high availability.

VoltDB claims to be 45 times faster than an Oracle relational database on a Dell PowerEdge R610 cluster based on Intel's Xeon 5550 with near-linear scaling on a 12-node cluster. VoltDB was the product of H-Store-project, a collaboration between Stonebraker's MIT home, Brown University, Yale University and Hewlett-Packard Labs.

Before VoltDB, there was Vertica. This used a column-oriented, shared-nothing architecture with a massively parallel processing (MPP) engine and data compression to reduce storage and speed queries. Vertica claims query results between 50 and 200 times faster than databases that store data in rows. Vertica started as the C-Store project also with Brown and MIT, plus Brandeis University and University of Massachusetts, Boston.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all."

Stonebraker reckons columnar-databases are quicker than relational databases because they know what they are looking for. They don't need to waste time sorting rows.

VoltDB, Versa, and - soon - SciDB take Stonebraker into a growing tussle against NoSQL over which architecture is "right" in a fight for mindshare and for customers. SciDB is listed as a NoSQL database, here.

Facing off against SciDB, Vertica and VoltDB in a range of scenarios are Hadoop, MapReduce, Cassandra, CouchDB, Amazon's SimpleDB and Memcached - the latter being the distributed memory caching companion to MySQL used for scale and speed. Helping push them are their creators such as Google and Amazon or startups like Cloudera, mega-scale customers such as Twitter and Facebook, and an army of evangelists convinced that NoSQL is the future.

Sparks flew between Stonebraker and the NoSQL movement in 2008 when the relational expert incensed MapReduce fans in a joint blog with DeWitt for calling MapReduce a "giant step backward in the programming paradigm for large-scale data intensive applications".

Stonebraker and DeWitt professed amazement at the hype over how MapReduce represented a "paradigm shift in the development of scalable, data-intensive applications" and called MapReduce a good idea for writing "certain types" of general-purpose computations but lacking many tools and features commonly associated with DBMS that users have come to depend on.

Bloggers stormed back, damning these "so-called" database experts for "not getting" data in the cloud and - like jealous suitors jumping to their lover's defense - demanded a retraction of this "highly inaccurate article" as if it had slandered their beloved MapReduce.

Most missed the point: Stonebraker and DeWitt weren't calling MapReduce a bad database. They were picking up on the fact that MapReduce - like its open-source clone Hadoop - are being used as if they are databases, with more data being dumped in them by customers on a daily basis and with those customers then needing to transact and analyze that data. It's a problem that's been creeping into Memcached and NoSQL, with people now trying to make Memcached and NoSQL work with relational databases.

Was Stonebraker surprised by the flames?

"The NoSQL guys are people who know nothing about databases and their first reaction is to lash out, so I'm not surprised [by the reaction]," he said.

"Talk to the MapReduce guys and they are fanatical about 'not invented here'... MapReduce was written by people who don't understand databases at all," an unapologetic Stonebraker continued. "They produced a thing that worked for their crawling applications. MapReduce was written to support the processing pipeline behind Google."

Turning MapReduce and Hadoop into databases would take a long time and a huge rewrite to inject things like data repositories, indexes, query languages and updates.

Does he recant in the face of such a flaming? Far from it. He's as critical as ever.

"If you are over 35, you are over the hill apparently in math," he claimed. "In computer science, the grey beards like me are still viable, and it's for this reason that what goes around comes around. The young guys haven't seen it before and the problem with our computer science education system is the lessons from the past seem to get lost."

And, it would seem, Google agrees with him.

Accidental SQL supporter

Stonebraker's got little time for those who claim it's the language that's slowing down databases serving big data. Hadoop is written in Java, CouchDB in Erlang, and in-memory key-value persistent storage engine Memcached in C. For Stonebraker, the interface is the problem, not the language. Hence Volt has been rewritten to remove 90 per cent of the overhead associated with OLTP.

"I'm not a particular fan of SQL but I don't mind it. Jettisoning it just to, say, "get record" is a huge mistake."

Interestingly, Stonebroker wrote Ingres in QUEL and left SQL to Ellison. The industry, and history, swung behind SQL, helping catapult Oracle to today's number-one position while Ingres didn't switch to SQL until version six in the mid 1990s - too late to catch Oracle.

3 Big data security analytics techniques

More from The Register

next story
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Patch iOS, OS X now: PDFs, JPEGs, URLs, web pages can pwn your kit
Plus: iThings and desktops at risk of NEW SSL attack flaw
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story

Whitepapers

Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.