Feeds

MongoDB straps SQL to Google's MapReduce

One toasting too many for NoSQL?

Business security measures using SSL

To NoSQLers he's the Devil who flames their work. Bring up his name while interviewing the CEO or founder of any NoSQL start-up, as I have, and the interviewee withers to a tight smile.

Say "Michael Stonebraker" to the database wizards of today, though, and they'll nod sagely at mention of the pioneer of relational database technology and main architect of INGRES; they believe the NoSQL pups of today are simply re-learning the hard lessons Stonebraker solved years ago.

Not so long ago, NoSQL was hailed by technology hipsters based both mentally and physically in Silicon Valley as the next evolutionary step of the database.

Stonebraker's relational baby had hit a wall, a system whose rows, columns, locks and triggers were unable to scale fast, cheaply or dynamically enough and unable to process fluidly enough the kind of unstructured data fragments Tweeting and Facebooking sent storming down pipe.

MongoDB, CouchDB, Cassandra, MapReduce, Hadoop and more: these were the future – scaling through software, not expensive hardware. Crucially, they also dispensed with needing to grapple with another language: SQL. CouchDB devs can code using Erlang instead while MongoDB uses C++.

The problem for NoSQL has been successfully breaking out from the high-octane, big-data worlds of Twitter and Facebook and into the every data world of enterprise IT. In this world, the job of building and running database systems cannot – as is currently the case with NoSQL – remain the preserve of a few rocket-scientist-type engineers. Here, salaried jobs rest on the fact database transactions operate reliably – and reliability is enshrined in the principles of ACID (atomicity, consistency, isolation and durability). But it is a principle that appears to have been sacrificed by NoSQL.

Yet the pendulum is swinging back, and I speak not just of greater tolerance for relational, as Reg regular Matt Asay writes here.

Later this year, MongoDB will creep closer to the world of relational and it will do so in a way that's designed to rectify one of the deficiencies in NoSQL pin-up MapReduce from Google.

MongoDB 2.2, which just hit testing and is due in a couple of months, introduces a programming framework that brings a particular SQL-like feature to this NoSQL database. That feature lets you easily group query results by one or more columns. Called the New Aggregation Framework, it will see MongoDB emulate the familiar SQL group-by function.

MongoDB uses Google's MapReduce for complex analytical tasks; MapReduce lets you batch process petabytes of data using parallel computing while abstracting away the complexity for the programmer. The "map" part of MapReduce provides data transformation while the "reduce" part, er, reduces...

MapReduce might be a rockstar for NoSQLers and an inspiration for Hadoop, but it's not good for batching up results. That's a problem because customers want the group-by functionality of SQL, the evil language used to manage data in relational databases. You can get this feature right now with MapReduce, yes, but not without custom coding some Javascript.

That's where the Framework comes in; while still feeding on MapReduce it provides a declarative programming to cut down on the amount of code you hack for queries. It also maps to C++.

Dwight Merriman, the CEO of 10gen, which provides MongoDB support and training, told The Reg on a trip to London last week: "We are building the Aggregation Framework to group by - that's consistent with the way people are using MongoDB. It's more like SQL in that it's declarative."

Merriman, who cut his teeth as co-founder and chief technology officer for DoubleClick - the mega ads network bought by Google in 2007 for $3.1bn - doffed his hat to SQL and relational and while saying MapReduce is capable of so much, he also conceded it's "a little verbose".

"SQL and relational are really good at reporting. This [the Framework] is rounding out the solution to be great at that too. SQL group by is very powerful but MapReduce is much more."

Merriman keeps the NoSQL faith, though. He believes the New Aggregation Framework can be even simpler than using SQL as it's implemented in databases such as Oracle. "It's cleaner to build a query," he said. "If you want to build a query for Oracle, for example, you have to do string concatenation to do the SQL statement. We are writing a query generator."

Merriman also defended NoSQL's ACID compromise. MongoDB can do atomic operations on a document level because the "majority" of cases are covered.

"You can do atomic operations on a document in MongoDB, it is durable and it's consistent and isolated but it's only ACID at the document level. It won't do it outside of doc because it could be on different servers," Merriman said. "You have enough ACID for an e-commerce system but you wouldn't build a general ledger system.'

While there might be trade offs for web apps many - especially those bears in enterprise IT - would disagree something can be "ACID enough" and argue something is ACID or not; surrendering ACID is especially risky because devs will obviously assume its properties will exist in the database they're targeting and will save their apps or the data should there be a problem. Without ACID in the database, extra care must be taken.

Articulating this concern is MarkLogic, an XML database provider of 10 years that's non-relational document store that now plugs into Hadoop but that plays also it old-skool by also adhering to ACID. Vice president of product strategy David Gorbet called it "dangerous" to drop ACID.

"Most people assume ACID is across all documents or entities in the data store. If you have to put an asterisk next to that it’s a buyer beware situation," he told us. "If you are building a web site and you know what the data model looks like you can make that decision... if you are building multiple applications on top of a single instance of data and you don’t know all the scenarios - that can be dangerous."

Despite such considerations, Merriman's company claims customers are buying in. It quotes Spanish telco Telefonica as one of its customers, spinning up seven MongoDB projects up from an initial one.

Have the hatchets been buried? MongoDB at least accepts relational has some good points, but respect is selective it seems. Expect more flames and sulphur. ®

New hybrid storage solutions

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.