Facebook: Why our 'next-gen' comms ditched MySQL

Cassandra founder plays Ace of HBase

High performance access to file storage

About a year ago, when Facebook set out to build its email-meets-chat-meets-everything-else messaging system, the company knew its infrastructure couldn't run the thing. "[The Facebook infrastructure] wasn't really ready to handle a bunch of different forms of messaging and have it happen in real time," says Joel Seligstein, a Facebook engineer who worked on the project. So Seligstein and crew mocked up a multifaceted messaging prototype, tossed it onto various distributed storage platforms, and ran a Big Data bake off.

The winner was HBase, the open source distributed database modeled after Google's proprietary BigTable platform. Facebook was already using MySQL for message storage, the open source Cassandra platform for inbox search, and the proprietary Haystack platform for storing photos. But in the company's mind, HBase was better equipped to handle a new-age messaging system that would seek to seamlessly juggle email, chat, and SMS as well as traditional on-site Facebook messages.

Originally built by Powerset – a semantic search outfit now owned by Microsoft – HBase is part of the Apache Hadoop project, a sweeping effort to mimic Google's back-end infrastructure. Like so many other big-name web outfits – including Microsoft, Yahoo!, and Twitter – Facebook uses the core Hadoop MapReduce platform to handle its back-end number-crunching tasks. It now spans a petabyte-scale cluster. "MapReduce is so useful," says Facebook infrastructure guru Karthik Ranganathan. "It's like the air you breathe."

Like Hadoop MapReduce, HBase uses the distributed Hadoop File System (HDFS), and Facebook chose it in part because it has experience scaling and debugging HDFS. But Hadoop isn't the only distributed platform the company uses on the back end. HBase was chosen, the company says, because it suited the task at hand.

The new Facebook messaging system uses Hbase to store the text and metadata for the messages as well as the indices needed to search these messages – an epic amount of data. Even before the new messaging system was rolled out, Facebook was juggling about 15 billion on-site messages a month (about 14 TB) and 120 billion chat messages (11 TB). It needed about three times more space to store all that data, with the average message going to multiple people, and as the new system adds good, old-fashioned email (from @facebook.com addresses), the storage requirements will only expand. HBase doesn't store everything – Haystack handles attachments and overly large messages – but it handles most of the data that's frequently updated.

"The email workload is a write dominated workload. We need to take a lot of writes very quickly," Ranganathan says. "We used HBase for the data the grows very fast, which is essentially the metadata."

Speaking alongside Seligstein at a recent "tech talk" that examined the infrastructure behind the company's new messaging system, unveiled last month, Ranganathan explained that the company chose HBase in part because its "strong consistency model." As it replicates data across myriad machines, he says, HBase is particularly adept at keeping all those replicas synchronized. "If something fails, we want to be able to read the data from somewhere else," Ranganathan says. "Once it's written, it doesn't matter what replica you go to. It's always in sync."

Facebook also picked HBase because it's designed for automatic failover. When a server goes down, it automatically reverts to another – at least in theory. "When you're talking about a lot of data, you're obviously talking about a lot of machines, and when you're talking about a lot of machines, failure is the norm," Ranganathan says. "When failure is the norm, we want to be able to automatically serve that data without too much intervention or it taking too much time."

At the same time, Ranganathan and crew like the way HBase works to prevent failure. As he puts it, when HBase breaks data into pieces and distributes them across a cluster, it puts multiple and completely separate data "shards" on each machine. This means that if one server dies, its work can be picked up several different machines and not just one. In other words: better load balancing.

If you put your data shards on just a few machines and a machine goes down, Ranganathan says, the burden is picked up by a single server, and this may end-up toppling other servers like dominos. "[The second server] dies, and then two machines' slack has to be taken up by a third, and this thing cascades all the way down," he says. "HBase shards the data into a lot of virtual shards and puts multiple shards on a single machine. So a machine dying spreads [its data] across multiple machines and the utilization of your entire cluster goes up."

Ranganathan also likes HBase's LZO (Lempel-Ziv-Oberhumer) data compression – "it's easier on the CPU," he says – and he likes that when you read, modify, and write in HBase, you can expect all readers to get the same values. "This really helps with counting – number of unread messages and things like that," he says.

But HBase did require modification. Facebook engineer Nicolas Spiegelberg and another company developer spent a year adding commits to the platform, mainly in an effort to minimize data loss. "The goal was zero data loss," says Spiegelberg. Committers updated HDFS Sync support, added some ACID properties, and even redesigned the HBase master.

Ranganathan and company used Haystack for attachment and very large messages because they didn't need the same write speed with these beefy files and Haystack could use far fewer servers in replicating the files across data centers. But they ditched MySQL entirely. They felt it couldn't provide quick access to such a large amount of data. "We have a lot of data that's quick and temporal," Seligstein says. "You need access to your first page of messages quite often."

For many, it's surprising that Facebook didn't put the messaging system on Cassandra, which was originally developed at the company and has since become hugely popular across the interwebs. But Ranganathan and others felt that it was too difficult to reconcile Cassandra's "eventual consistency model" with the messaging system setup.

"If you're going to front the data with some sort of a cache...then it becomes very difficult for use to program a cache where the database heals from underneath," Ranganathan says. "For a product like messaging, we needed a strong consistency model [as HBase offers]." If a user sends an email, he says, you have to be able to tell the user – immediately the email was sent. "If it doesn't show that it's been sent, I think 'Oh, I didn't send it,' and then I send it again. Then I come in the next day and I see it's been sent twice, and I get pissed at Facebook."

He also felt that the system needed HBase physical replication as opposed to Cassandra's logical replication. With Cassandra, he says, if something gets corrupted and you lose a disk and you have to restore the data, you have to restore the entire replica. Because Facebook uses very dense machines, this would mean a very long recovery time.

Some Cassandra backers don't see it that way. But Ranganathan is at least worth listening to. He's one of Cassandra's original authors. ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
prev story


Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.