Facebook: Why our 'next-gen' comms ditched MySQL

Cassandra founder plays Ace of HBase

The smart choice: opportunity from uncertainty

About a year ago, when Facebook set out to build its email-meets-chat-meets-everything-else messaging system, the company knew its infrastructure couldn't run the thing. "[The Facebook infrastructure] wasn't really ready to handle a bunch of different forms of messaging and have it happen in real time," says Joel Seligstein, a Facebook engineer who worked on the project. So Seligstein and crew mocked up a multifaceted messaging prototype, tossed it onto various distributed storage platforms, and ran a Big Data bake off.

The winner was HBase, the open source distributed database modeled after Google's proprietary BigTable platform. Facebook was already using MySQL for message storage, the open source Cassandra platform for inbox search, and the proprietary Haystack platform for storing photos. But in the company's mind, HBase was better equipped to handle a new-age messaging system that would seek to seamlessly juggle email, chat, and SMS as well as traditional on-site Facebook messages.

Originally built by Powerset – a semantic search outfit now owned by Microsoft – HBase is part of the Apache Hadoop project, a sweeping effort to mimic Google's back-end infrastructure. Like so many other big-name web outfits – including Microsoft, Yahoo!, and Twitter – Facebook uses the core Hadoop MapReduce platform to handle its back-end number-crunching tasks. It now spans a petabyte-scale cluster. "MapReduce is so useful," says Facebook infrastructure guru Karthik Ranganathan. "It's like the air you breathe."

Like Hadoop MapReduce, HBase uses the distributed Hadoop File System (HDFS), and Facebook chose it in part because it has experience scaling and debugging HDFS. But Hadoop isn't the only distributed platform the company uses on the back end. HBase was chosen, the company says, because it suited the task at hand.

The new Facebook messaging system uses Hbase to store the text and metadata for the messages as well as the indices needed to search these messages – an epic amount of data. Even before the new messaging system was rolled out, Facebook was juggling about 15 billion on-site messages a month (about 14 TB) and 120 billion chat messages (11 TB). It needed about three times more space to store all that data, with the average message going to multiple people, and as the new system adds good, old-fashioned email (from @facebook.com addresses), the storage requirements will only expand. HBase doesn't store everything – Haystack handles attachments and overly large messages – but it handles most of the data that's frequently updated.

"The email workload is a write dominated workload. We need to take a lot of writes very quickly," Ranganathan says. "We used HBase for the data the grows very fast, which is essentially the metadata."

Speaking alongside Seligstein at a recent "tech talk" that examined the infrastructure behind the company's new messaging system, unveiled last month, Ranganathan explained that the company chose HBase in part because its "strong consistency model." As it replicates data across myriad machines, he says, HBase is particularly adept at keeping all those replicas synchronized. "If something fails, we want to be able to read the data from somewhere else," Ranganathan says. "Once it's written, it doesn't matter what replica you go to. It's always in sync."

Facebook also picked HBase because it's designed for automatic failover. When a server goes down, it automatically reverts to another – at least in theory. "When you're talking about a lot of data, you're obviously talking about a lot of machines, and when you're talking about a lot of machines, failure is the norm," Ranganathan says. "When failure is the norm, we want to be able to automatically serve that data without too much intervention or it taking too much time."

At the same time, Ranganathan and crew like the way HBase works to prevent failure. As he puts it, when HBase breaks data into pieces and distributes them across a cluster, it puts multiple and completely separate data "shards" on each machine. This means that if one server dies, its work can be picked up several different machines and not just one. In other words: better load balancing.

If you put your data shards on just a few machines and a machine goes down, Ranganathan says, the burden is picked up by a single server, and this may end-up toppling other servers like dominos. "[The second server] dies, and then two machines' slack has to be taken up by a third, and this thing cascades all the way down," he says. "HBase shards the data into a lot of virtual shards and puts multiple shards on a single machine. So a machine dying spreads [its data] across multiple machines and the utilization of your entire cluster goes up."

Ranganathan also likes HBase's LZO (Lempel-Ziv-Oberhumer) data compression – "it's easier on the CPU," he says – and he likes that when you read, modify, and write in HBase, you can expect all readers to get the same values. "This really helps with counting – number of unread messages and things like that," he says.

But HBase did require modification. Facebook engineer Nicolas Spiegelberg and another company developer spent a year adding commits to the platform, mainly in an effort to minimize data loss. "The goal was zero data loss," says Spiegelberg. Committers updated HDFS Sync support, added some ACID properties, and even redesigned the HBase master.

Ranganathan and company used Haystack for attachment and very large messages because they didn't need the same write speed with these beefy files and Haystack could use far fewer servers in replicating the files across data centers. But they ditched MySQL entirely. They felt it couldn't provide quick access to such a large amount of data. "We have a lot of data that's quick and temporal," Seligstein says. "You need access to your first page of messages quite often."

For many, it's surprising that Facebook didn't put the messaging system on Cassandra, which was originally developed at the company and has since become hugely popular across the interwebs. But Ranganathan and others felt that it was too difficult to reconcile Cassandra's "eventual consistency model" with the messaging system setup.

"If you're going to front the data with some sort of a cache...then it becomes very difficult for use to program a cache where the database heals from underneath," Ranganathan says. "For a product like messaging, we needed a strong consistency model [as HBase offers]." If a user sends an email, he says, you have to be able to tell the user – immediately the email was sent. "If it doesn't show that it's been sent, I think 'Oh, I didn't send it,' and then I send it again. Then I come in the next day and I see it's been sent twice, and I get pissed at Facebook."

He also felt that the system needed HBase physical replication as opposed to Cassandra's logical replication. With Cassandra, he says, if something gets corrupted and you lose a disk and you have to restore the data, you have to restore the entire replica. Because Facebook uses very dense machines, this would mean a very long recovery time.

Some Cassandra backers don't see it that way. But Ranganathan is at least worth listening to. He's one of Cassandra's original authors. ®

Securing Web Applications Made Simple and Scalable

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.