Feeds

Facebook's new comms: 'our largest ever engineering project'

15 whole engineers tackle Hbase!

  • alert
  • submit to reddit

Intelligent flash storage arrays

The creation of Facebook's new messaging system was the company's largest-ever engineering project, according to director of engineering Andrew Bosworth.

The project spanned more than a year, according to company founder Mark Zuckerberg, and it included the roll-out of a new distributed database platform. The system uses HBase, the open source incarnation of Google's BigTable platform that was originally built by the semantic search outfit Powerset, now owned by Microsoft, as part of the Apache Hadoop project, which mimics multiple pieces of Google's proprietary infrastructure.

It's telling, however, that the largest engineering project in the history of Facebook was handled by a mere 15 engineers.

The system is designed to provide a single interface for handling email, IM, text messages, and on-site Facebook messages, and it includes a single archive for all these various communications. To accomodate this, Bosworth said, the company needed a new database infrastructure. The company was already using MySQL as the primary repository for user data, the open source Cassandra platform for inbox search, the Hadoop-friendly SQL-like language known as Hive for analytics, and the proprietary Haystack for photos. But Hbase provided something different.

"In order to support this really cool and very deep scenario...we needed to rebuild our [messaging] infrastructure." Bosworth said. "Over a year ago, we started looking at storage systems...We weren't sure of the trade-offs [with Cassandra]. We tested MySQL, but we weren't sure it could perform with long-tail data. So we invested in HBase." Hbase is built atop HDFS, the Hadoop File System, an open source incarnation of the old Google File System (GFS).

"Because we want to expose the long-tail of your conversation history really easily, a log-based storage system like HBase makes the most sense," Bosworth told The Reg. "Cassandra — which we love and we built — has some trade-offs around consistency. Because we want this to be real-time — so you always know what messages you're getting — we didn't like those trade-offs."

But the platform also taps Haystack — the existing photo infrastructure — to provide support for email attachments.

Asked if Facebook has any intention of standardizing on a single database platform — if juggling so many platforms would eventually cause unwanted issues — Bosworth told us that for the time being, the company intends to use separate platforms for separate tasks. "With Facebook's technology stack in general, we've really tried to use the right technology for the problem we're solving," he said. "You can get into trouble over-standardizing the technology.

"You build a round hole because you have a round peg. But then you get a square peg and you can't fit it into the hole. We approach things differently"

Also differently from Google.

"We have small, really quick engineering teams. This is the biggest engineering team we've ever built around a new product, and it's still only 15 engineers." ®

Providing a secure and efficient Helpdesk

More from The Register

next story
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
Chrome 38's new HTML tag support makes fatties FIT and SKINNIER
First browser to protect networks' bandwith using official spec
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
Admins! Never mind POODLE, there're NEW OpenSSL bugs to splat
Four new patches for open-source crypto libraries
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.