Feeds

MongoDB straps SQL to Google's MapReduce

One toasting too many for NoSQL?

High performance access to file storage

To NoSQLers he's the Devil who flames their work. Bring up his name while interviewing the CEO or founder of any NoSQL start-up, as I have, and the interviewee withers to a tight smile.

Say "Michael Stonebraker" to the database wizards of today, though, and they'll nod sagely at mention of the pioneer of relational database technology and main architect of INGRES; they believe the NoSQL pups of today are simply re-learning the hard lessons Stonebraker solved years ago.

Not so long ago, NoSQL was hailed by technology hipsters based both mentally and physically in Silicon Valley as the next evolutionary step of the database.

Stonebraker's relational baby had hit a wall, a system whose rows, columns, locks and triggers were unable to scale fast, cheaply or dynamically enough and unable to process fluidly enough the kind of unstructured data fragments Tweeting and Facebooking sent storming down pipe.

MongoDB, CouchDB, Cassandra, MapReduce, Hadoop and more: these were the future – scaling through software, not expensive hardware. Crucially, they also dispensed with needing to grapple with another language: SQL. CouchDB devs can code using Erlang instead while MongoDB uses C++.

The problem for NoSQL has been successfully breaking out from the high-octane, big-data worlds of Twitter and Facebook and into the every data world of enterprise IT. In this world, the job of building and running database systems cannot – as is currently the case with NoSQL – remain the preserve of a few rocket-scientist-type engineers. Here, salaried jobs rest on the fact database transactions operate reliably – and reliability is enshrined in the principles of ACID (atomicity, consistency, isolation and durability). But it is a principle that appears to have been sacrificed by NoSQL.

Yet the pendulum is swinging back, and I speak not just of greater tolerance for relational, as Reg regular Matt Asay writes here.

Later this year, MongoDB will creep closer to the world of relational and it will do so in a way that's designed to rectify one of the deficiencies in NoSQL pin-up MapReduce from Google.

MongoDB 2.2, which just hit testing and is due in a couple of months, introduces a programming framework that brings a particular SQL-like feature to this NoSQL database. That feature lets you easily group query results by one or more columns. Called the New Aggregation Framework, it will see MongoDB emulate the familiar SQL group-by function.

MongoDB uses Google's MapReduce for complex analytical tasks; MapReduce lets you batch process petabytes of data using parallel computing while abstracting away the complexity for the programmer. The "map" part of MapReduce provides data transformation while the "reduce" part, er, reduces...

MapReduce might be a rockstar for NoSQLers and an inspiration for Hadoop, but it's not good for batching up results. That's a problem because customers want the group-by functionality of SQL, the evil language used to manage data in relational databases. You can get this feature right now with MapReduce, yes, but not without custom coding some Javascript.

That's where the Framework comes in; while still feeding on MapReduce it provides a declarative programming to cut down on the amount of code you hack for queries. It also maps to C++.

Dwight Merriman, the CEO of 10gen, which provides MongoDB support and training, told The Reg on a trip to London last week: "We are building the Aggregation Framework to group by - that's consistent with the way people are using MongoDB. It's more like SQL in that it's declarative."

Merriman, who cut his teeth as co-founder and chief technology officer for DoubleClick - the mega ads network bought by Google in 2007 for $3.1bn - doffed his hat to SQL and relational and while saying MapReduce is capable of so much, he also conceded it's "a little verbose".

"SQL and relational are really good at reporting. This [the Framework] is rounding out the solution to be great at that too. SQL group by is very powerful but MapReduce is much more."

Merriman keeps the NoSQL faith, though. He believes the New Aggregation Framework can be even simpler than using SQL as it's implemented in databases such as Oracle. "It's cleaner to build a query," he said. "If you want to build a query for Oracle, for example, you have to do string concatenation to do the SQL statement. We are writing a query generator."

Merriman also defended NoSQL's ACID compromise. MongoDB can do atomic operations on a document level because the "majority" of cases are covered.

"You can do atomic operations on a document in MongoDB, it is durable and it's consistent and isolated but it's only ACID at the document level. It won't do it outside of doc because it could be on different servers," Merriman said. "You have enough ACID for an e-commerce system but you wouldn't build a general ledger system.'

While there might be trade offs for web apps many - especially those bears in enterprise IT - would disagree something can be "ACID enough" and argue something is ACID or not; surrendering ACID is especially risky because devs will obviously assume its properties will exist in the database they're targeting and will save their apps or the data should there be a problem. Without ACID in the database, extra care must be taken.

Articulating this concern is MarkLogic, an XML database provider of 10 years that's non-relational document store that now plugs into Hadoop but that plays also it old-skool by also adhering to ACID. Vice president of product strategy David Gorbet called it "dangerous" to drop ACID.

"Most people assume ACID is across all documents or entities in the data store. If you have to put an asterisk next to that it’s a buyer beware situation," he told us. "If you are building a web site and you know what the data model looks like you can make that decision... if you are building multiple applications on top of a single instance of data and you don’t know all the scenarios - that can be dangerous."

Despite such considerations, Merriman's company claims customers are buying in. It quotes Spanish telco Telefonica as one of its customers, spinning up seven MongoDB projects up from an initial one.

Have the hatchets been buried? MongoDB at least accepts relational has some good points, but respect is selective it seems. Expect more flames and sulphur. ®

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.