Feeds

MongoDB straps SQL to Google's MapReduce

One toasting too many for NoSQL?

Security and trust: The backbone of doing business over the internet

To NoSQLers he's the Devil who flames their work. Bring up his name while interviewing the CEO or founder of any NoSQL start-up, as I have, and the interviewee withers to a tight smile.

Say "Michael Stonebraker" to the database wizards of today, though, and they'll nod sagely at mention of the pioneer of relational database technology and main architect of INGRES; they believe the NoSQL pups of today are simply re-learning the hard lessons Stonebraker solved years ago.

Not so long ago, NoSQL was hailed by technology hipsters based both mentally and physically in Silicon Valley as the next evolutionary step of the database.

Stonebraker's relational baby had hit a wall, a system whose rows, columns, locks and triggers were unable to scale fast, cheaply or dynamically enough and unable to process fluidly enough the kind of unstructured data fragments Tweeting and Facebooking sent storming down pipe.

MongoDB, CouchDB, Cassandra, MapReduce, Hadoop and more: these were the future – scaling through software, not expensive hardware. Crucially, they also dispensed with needing to grapple with another language: SQL. CouchDB devs can code using Erlang instead while MongoDB uses C++.

The problem for NoSQL has been successfully breaking out from the high-octane, big-data worlds of Twitter and Facebook and into the every data world of enterprise IT. In this world, the job of building and running database systems cannot – as is currently the case with NoSQL – remain the preserve of a few rocket-scientist-type engineers. Here, salaried jobs rest on the fact database transactions operate reliably – and reliability is enshrined in the principles of ACID (atomicity, consistency, isolation and durability). But it is a principle that appears to have been sacrificed by NoSQL.

Yet the pendulum is swinging back, and I speak not just of greater tolerance for relational, as Reg regular Matt Asay writes here.

Later this year, MongoDB will creep closer to the world of relational and it will do so in a way that's designed to rectify one of the deficiencies in NoSQL pin-up MapReduce from Google.

MongoDB 2.2, which just hit testing and is due in a couple of months, introduces a programming framework that brings a particular SQL-like feature to this NoSQL database. That feature lets you easily group query results by one or more columns. Called the New Aggregation Framework, it will see MongoDB emulate the familiar SQL group-by function.

MongoDB uses Google's MapReduce for complex analytical tasks; MapReduce lets you batch process petabytes of data using parallel computing while abstracting away the complexity for the programmer. The "map" part of MapReduce provides data transformation while the "reduce" part, er, reduces...

MapReduce might be a rockstar for NoSQLers and an inspiration for Hadoop, but it's not good for batching up results. That's a problem because customers want the group-by functionality of SQL, the evil language used to manage data in relational databases. You can get this feature right now with MapReduce, yes, but not without custom coding some Javascript.

That's where the Framework comes in; while still feeding on MapReduce it provides a declarative programming to cut down on the amount of code you hack for queries. It also maps to C++.

Dwight Merriman, the CEO of 10gen, which provides MongoDB support and training, told The Reg on a trip to London last week: "We are building the Aggregation Framework to group by - that's consistent with the way people are using MongoDB. It's more like SQL in that it's declarative."

Merriman, who cut his teeth as co-founder and chief technology officer for DoubleClick - the mega ads network bought by Google in 2007 for $3.1bn - doffed his hat to SQL and relational and while saying MapReduce is capable of so much, he also conceded it's "a little verbose".

"SQL and relational are really good at reporting. This [the Framework] is rounding out the solution to be great at that too. SQL group by is very powerful but MapReduce is much more."

Merriman keeps the NoSQL faith, though. He believes the New Aggregation Framework can be even simpler than using SQL as it's implemented in databases such as Oracle. "It's cleaner to build a query," he said. "If you want to build a query for Oracle, for example, you have to do string concatenation to do the SQL statement. We are writing a query generator."

Merriman also defended NoSQL's ACID compromise. MongoDB can do atomic operations on a document level because the "majority" of cases are covered.

"You can do atomic operations on a document in MongoDB, it is durable and it's consistent and isolated but it's only ACID at the document level. It won't do it outside of doc because it could be on different servers," Merriman said. "You have enough ACID for an e-commerce system but you wouldn't build a general ledger system.'

While there might be trade offs for web apps many - especially those bears in enterprise IT - would disagree something can be "ACID enough" and argue something is ACID or not; surrendering ACID is especially risky because devs will obviously assume its properties will exist in the database they're targeting and will save their apps or the data should there be a problem. Without ACID in the database, extra care must be taken.

Articulating this concern is MarkLogic, an XML database provider of 10 years that's non-relational document store that now plugs into Hadoop but that plays also it old-skool by also adhering to ACID. Vice president of product strategy David Gorbet called it "dangerous" to drop ACID.

"Most people assume ACID is across all documents or entities in the data store. If you have to put an asterisk next to that it’s a buyer beware situation," he told us. "If you are building a web site and you know what the data model looks like you can make that decision... if you are building multiple applications on top of a single instance of data and you don’t know all the scenarios - that can be dangerous."

Despite such considerations, Merriman's company claims customers are buying in. It quotes Spanish telco Telefonica as one of its customers, spinning up seven MongoDB projects up from an initial one.

Have the hatchets been buried? MongoDB at least accepts relational has some good points, but respect is selective it seems. Expect more flames and sulphur. ®

Security and trust: The backbone of doing business over the internet

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.