Feeds

Facebook's new comms: 'our largest ever engineering project'

15 whole engineers tackle Hbase!

  • alert
  • submit to reddit

Top 5 reasons to deploy VMware with Tegile

The creation of Facebook's new messaging system was the company's largest-ever engineering project, according to director of engineering Andrew Bosworth.

The project spanned more than a year, according to company founder Mark Zuckerberg, and it included the roll-out of a new distributed database platform. The system uses HBase, the open source incarnation of Google's BigTable platform that was originally built by the semantic search outfit Powerset, now owned by Microsoft, as part of the Apache Hadoop project, which mimics multiple pieces of Google's proprietary infrastructure.

It's telling, however, that the largest engineering project in the history of Facebook was handled by a mere 15 engineers.

The system is designed to provide a single interface for handling email, IM, text messages, and on-site Facebook messages, and it includes a single archive for all these various communications. To accomodate this, Bosworth said, the company needed a new database infrastructure. The company was already using MySQL as the primary repository for user data, the open source Cassandra platform for inbox search, the Hadoop-friendly SQL-like language known as Hive for analytics, and the proprietary Haystack for photos. But Hbase provided something different.

"In order to support this really cool and very deep scenario...we needed to rebuild our [messaging] infrastructure." Bosworth said. "Over a year ago, we started looking at storage systems...We weren't sure of the trade-offs [with Cassandra]. We tested MySQL, but we weren't sure it could perform with long-tail data. So we invested in HBase." Hbase is built atop HDFS, the Hadoop File System, an open source incarnation of the old Google File System (GFS).

"Because we want to expose the long-tail of your conversation history really easily, a log-based storage system like HBase makes the most sense," Bosworth told The Reg. "Cassandra — which we love and we built — has some trade-offs around consistency. Because we want this to be real-time — so you always know what messages you're getting — we didn't like those trade-offs."

But the platform also taps Haystack — the existing photo infrastructure — to provide support for email attachments.

Asked if Facebook has any intention of standardizing on a single database platform — if juggling so many platforms would eventually cause unwanted issues — Bosworth told us that for the time being, the company intends to use separate platforms for separate tasks. "With Facebook's technology stack in general, we've really tried to use the right technology for the problem we're solving," he said. "You can get into trouble over-standardizing the technology.

"You build a round hole because you have a round peg. But then you get a square peg and you can't fit it into the hole. We approach things differently"

Also differently from Google.

"We have small, really quick engineering teams. This is the biggest engineering team we've ever built around a new product, and it's still only 15 engineers." ®

Security for virtualized datacentres

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
'People have forgotten just how late the first iPhone arrived ...'
Plus: 'Google's IDEALISM is an injudicious justification for inappropriate biz practices'
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Netscape plugins about to stop working in Chrome for Mac
Google kills off 32-bit Chrome, only on Mac
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.