The Register® — Biting the hand that feeds IT

Feeds

Database high priest mud-wrestles Facebook

Rubbishes MySQL. Bitchslaps NoSQL

Customer Success Testimonial: Recovery is Everything

Mike Stonebraker is famous for slagging Google's backend. And now he's slagging Facebook's too.

Last week, in a piece from our friends at GigaOM, Database Grandpoobah Mike Stonebraker announced that Facebook's continued dependance on MySQL was “a fate worse than death,” insisting that the social network's only route to salvation is to “bite the bullet and rewrite everything.”

We're confident he was quoted warmly and accurately. After all, he said much the same thing to The Register. "Facebook has shared their social network over something north of 4,000 MySQL instances, and that's nowhere near fast enough, so they're put 9,000 instances of memcached in memory in front of them. They are just dying trying to manage this," Stonebraker recently told us. "They have to do data consistency and crash recovery in user space."

Mike Stonebraker

Mike Stonebraker

As a professor of computer science at the University of California, Berkeley, Stonebraker helped develop the Ingres and Postgres relational databases, but in an age where ordinary relational databases can't always keep pace with internet-sized applications, he now backs a new breed of distributed in-memory database designed to handle exponentially larger amounts of information. In addition to serving as an adjunct professor at MIT, Stonebraker is the chief technology officer at VoltDB, an outfit that sells this sort of "NewSQL" database.

Stonebraker's Facebook comments drew fire not only from a core database engineer at Mark Zuckerberg's social networking outfit, but also from the recognized kingpin of "cloud computing": Amazon chief technology officer Werner Vogels. Both argue that Stonebraker has no right to his opinion because he's never driven the sort of massive backend that drives likes of Facebook and Amazon.

But Stonebraker was dead right several years back when he exposed the flaws of the MapReduce distributed number crunching platform that underpinned Google's backend infrastructure – even Google admitted as much – and as vehemently as Facebook defends its MySQL setup, there are other cases where the company has dropped the old school relational database in favor of distributed "NoSQL" platforms such as the Cassandra database built by Facebook and HBase, the open source offering inspired by Google's BigTable.

'Go write a paper'

Twelve hours after GigaOm's article appeared, Facebook database engineer Domas Mituzas unloaded on Stonebraker from somewhere in Lithuania, implying that the longtime professor doesn't understand the demands of a major website. Facebook, he said, focuses getting the most performance out of "mixed composition" I/O devices rather than in-memory data because it saves the company cash.

"I feel somewhat sad that I have to put this truism out here: disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read," he wrote. "Building the web that lasts is completely different task from what academia people imagine building the web is."

And he wasn't done. He added that Stonebraker – and some other unnamed database "pioneer" – failed to realize that using disks would save the world. "I already had this issue with [another] RDBMS pioneer...he also suggested that disks are things of the past and now everything has to be in memory, because memory is cheap. And data can be whatever unordered clutter, because CPUs can sort it, because CPUs are cheap," Mituzas wrote.

"Throwing more and more hardware without fine tuning for actual operational efficiency requirements is wasteful and harms our planet. Yes, we do lots of in-memory efficiency work, so that we reduce our I/O, but at the same time we balance the workload so that I/O subsystem provides as efficient as possible delivery of the long tail.

"What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched. What happens in the academia of in-memory databases, if one gets 2x efficiency gain? A paper. What happens when real world doesn’t read your papers anymore? You troll everyone via GigaOM."

That's quite a flame when you consider Stonebraker's pedigree. But Mituzas stood by his post. And he was backed by Vogels. "If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook's data store," the Amazon CTO tweeted. And then he tweeted again: "Scaling systems is like moving customers from single engine Cessna to 747 without them noticing it, with no touchdown & refueling in mid-air."

And again: "Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works".

Stonebraker versus the world

But Stonebraker dares. And whether you agree with his language or not, on some level he has a point. Rather than use MySQL, Facebook built Cassandra for its inbox search tool, and it went with HBase for its new messaging platform. These distributed databases abandon the traditional SQL model in favor of distributed non-relational architectures that can readily scale. "[Facebook] has got something that works: sharding MySQL. But the problem with sharding MySQL is not that it can't be made to work, so much that it's not application transparent across systems," says Jonathan Ellis, the chair of the open source Cassandra project and the CTO of DataStax, the Texas outfit that has commercialized the platform.

"You saw that they went for Cassandra for inbox search and HBase for messaging. The reason they're not doing that on MySQL is that sharding MySQL is a lot of effort and you have to apply that effort to each new project."

The extra twist of the knife is that Stonebraker has little respect for Cassandra or HBase either. VoltDB provides the speed of Cassandra and HBase and other NoSQL databases such as MongoDB, he says, but it retains the relational model. It doesn't limit your transactional semantics. He calls this NewSQL, in clear response to the NoSQL movement.

"At least for new OLTP applications, giving up ACID and giving up SQL is a terrible idea. You don't have to give up either of those. You can go fast without giving up either. If you give up ACID, you end up pushing data consistency into the application logic and that's just way harder to do," Stonebraker tells us.

"We've benchmarked ourselves against Cassandra on TPC-C, and we're a factor of five faster...the difference between NoSQL and NewSQL performance is a very big number."

Asked about Stonebraker's claims, DataStax's Jonathan Ellis argues that VoltDB has its own limitations. "There's a ton of limitations that VoltDB marketing doesn't tell you about," he says. "We're had people complain that you have to do queries within a partition and that if you step outside of that, it doesn't warn you. I also think that the focus on in-memory-only is limiting. Almost all Cassandra users have datasets larger than memory, and some subset of that will be active at any time.

"Either you have to buy ten times as many servers so you can fit the whole thing in RAM or license something like [the HP realtime processing engine] Vertica to put it offline. Neither is a compelling story."

So, it's Stonebraker against the web. And the difference of option is severe. In May, at a MongoDB developer conference in San Francisco, Mongo creator Dwight Merriman told his audience there was "no way" to do distributed joins in a way that really scales. "I'm not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational," he said

"You can do distributed transactions, but if you do them with no loss of generality and you do them across a thousand machines, it's not going to be that fast."

Stonebraker says precisely the opposite, and in typical fashion, he goes right for the jugular. "I reject what Merriman says out of hand," he tells The Register. Merriman and his company, 10gen, declined to comment for this story. But Stonebaker says words don't matter. As much as he likes to wield his opinions, he insists the debate will be decided elsewhere. "Let the bake-off begin," he crows.

Of course, as Facebook points out, speed isn't everything. In the end, there's no deciding this debate. Not that we would want to. It's too much fun. ®

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

Nobody says he does not

The flamewar masks the actual conflict here.

What Stonebraker advocates for is essentially two-tier architectures. Front-end talking to super-scalable back-end which directly manipulates data. No baby-sitting middleware.

What facebook and everyone who wants to scale do is three tiers - front-end, middleware, database. In most three tiers the middleware does LOTS of work in terms of data availability, integrity and performance (it is an ecumenical matter where memcached sits, but IMO it is a part of the middle tier).

In reality Stonebraker is probably right technically.

However, similarly he is definitely wrong in terms of realities of life. 99%+ of the staff you can hire cannot and will not learn how to talk to the ACID backend and _WANT_ the middleware so they can get their work done. Similarly, 99% of software architects and project managers _WANT_ the middleware to ensure that developers do not do something vehemently stupid with the data.

As a result, like it or not the middle tier is there anyway. If it is there however, you might as well make it do a few things which in Stonebarkers abhors.

4
0

@Destroy All Monsters

> SQL is just the crappy insanely dumb query language on top of whatever your database is

What an utterly idiotic and ignorant statement to make.... SQL is not crap. SQL is not dumb. SQL is the fastest method to crunch data on SQL-based databases. And it can scales very well.

Of course - this also depends how SQL is used. A simple thing like using bind variables is often ignored. SQL used as a mere I/O layer like one would treat kernel device read() and write() calls.

That type of ignorant use of SQL... no wonder your application and database performance sucks.

On my databases - runninng a 1000+ SQLs per second is the norm. Not the exception. And there's no way in hell that you could ever get that performance by pulling db data into a client process, crunching it there and ship it (across process and memory and even h/w boundaries) back to the database.

The biggest monster of all?

Ignorance.

4
1

Stonebraker senile? Hardly

Having worked with his Illustra crew while at Informix, I'm going to give Stonebraker the benefit of doubt.

Sure it took 5 years for Informix to absorb the extensibility of Illustra and to make it scale. But it was done and the current release 11.7x definitely has a lot going for it. (Except that Mills and company at IBM still baby DB2... ;-) [Yes I'm that Gumby and yes I'm an Informix bigot]

The point is that Stonebraker knows his stuff and he's actually right in some of his comments.

Sharding data isn't a good idea.

3
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
Bjarne Again: Hallelujah for C++
Plus: Now officially OK to admit you never used STL algorithms
Interwebs taunt Sir Jony over Apple eye candy makeover
Hey Ive, Ive... add more unicorns, willya?
Apple: iOS7 dayglo Barbie makeover is UNFINISHED - report
Plus: You don't like the icons? Blame marketing
Red Hat to ditch MySQL for MariaDB in RHEL 7
So long, Oracle! Don't let the door hit you on the way out
Shy? Socially inadequate? Fiddling with your phone could help
App 'tells the brutal truth' about social inadequates' chatup lines
Java EE 7 melds HTML5 with enterprise apps
New release arrives with GlassFish, NetBeans support
 breaking news
'Office Facebook' firm Tibbr wants you to PAY for mobe-meetings app
Great idea. Punters won't cough for it though
 breaking news
The only Waze is Google: Ad giant tipped to gobble map app 'for $1.3bn'
Pac-Man-satnav-ish upstart in bidding war with Apple, Facebook
 breaking news
PM Cameron calls for modern, programmable computers! (We think)
IT education musings to G8 chiefs to mystify IT industry