Facebook Data Center: If it won't run ARM, what will it run?
Zuckerberg to unmask shiny new backend
In August, the rumor was that Facebook planned to pack its first custom-built data center with ARM servers, abandoning traditional x86 chips from the likes of Intel and AMD. The trouble was that the rumor arrived via a site calling itself SemiAccurate, and Facebook promptly told the world it wasn't accurate at all.
But on Thursday morning, Facebook will unveil a change to both the hardware and software running its back-end infrastructure, and it seems that the company will finally tell us what will go into its first custom-built data center, a facility under construction in Prineville, Oregon. Facebook may not be using ARM servers, but judging from comments the company has made in the past, we can't help but wonder if it's taking at least a small step towards the new breed of server based on low-power processors – and possibly towards massively multicore servers, machines that cram hundreds of cores into a single chassis.
Late last week, the social networking giant invited certain members of the press to an April 7 event during which the company will provide a "behind-the-scenes look at the latest technology powering Facebook." A company spokesman tells The Register that the news involves changes to both its hardware and software infrastructure, and it would seem that these changes will be rolled out at its new data center.
Facebook declined to say more. But on two separate occasions over the past two years, Jonathan Heiliger, vice president of technical operations at Facebook, hinted that the company was eyeing a move beyond traditional Intel- and AMD-based x86 servers. And earlier this month, the company specifically indicated that it is testing "micro servers" based on Intel's low-power Xeon E3-1200 series processors.
In the spring of 2009, at a conference in San Francisco, Heiliger roasted Intel and AMD as well as the big name server vendors, saying they weren't providing the sort of hardware needed to drive its massive web infrastructure. "The biggest thing that surprised us - or is about to surprise us - is the less-than-anticipated performance gains from new microarchitectures," he said, following the release of AMD's "Istanbul" Opteron chip and Intel's "Nehalem EP" Xeon.
"The performance gains they are touting in the press, we are not seeing in our applications. We are literally in real time trying to figure out why that is and if there are optimizations that we can do. Otherwise, we are kind of left with current-generation technology and current-generation scale."
He was also critical of the power consumption exhibited by machines from the mainstream server vendors. "I am not sure whether to be embarrassed or pleased for the OEM and system vendors in the audience, but you guys just don't get it," he said, before accusing server makers of "failing us".
Then, a year later, at the same conference, he hinted that Facebook was interested in alternative hardware from chip maker Tilera and server builder SeaMicro. Last year, both of these companies unveiled hardware that lets you run several thousand cores in a single rack while consuming less power than a rack of standard x86 servers.
It's unlikely Facebook will make the leap to SeaMicro, and though Tilera-based machines are a possibility, it's far more likely that the social-networking giant will move so-called micro servers based on Intel's E3 chips. At the press event touting the "Sandy Bridge" Xeon E3-1200 processors three weeks ago, Intel trotted out Gio Coglitore, director of Facebook Labs. Coglitore said that Facebook has been testing micro servers and that it may deploy them in 2011 or 2012.
The other thing to consider is that Facebook is a flagship customer of Dell's Data Center Solutions (DCS) group, a Dell division that designs bespoke servers on behalf of companies running unusually large online services. Dell tells us that it will be present at the Facebook announcement on Thursday.
Facebook never revealed the Dell custom server designs it's using in its existing leased data centers, but it's safe to assume they've inspired at least some of the PowerEdge-C servers that Dell has been selling for the past year. The question is whether Facebook and Dell – perhaps with the help of an outside manufacturer – are now building machines reminiscent of the Tilera and SeaMicro hardware Heiliger talked up last spring or the micro servers touted by Coglitore.
In October 2009, Tilera announced plans to squeeze 100 cores onto a single chip die, and last year, it said it would eventually push its design to 200 cores. It's TileGx chips run Linux directly, and they're specifically designed for the LAMP stack, which also includes the Apache web server, MySQL, and PHP. Facebook is a LAMP shop, though it's now converting its PHP to C++ using an open source code transformer it built called HipHop for PHP.
Tilera has received $10m in funding from Quanta Computer, the Taiwanese manufacturer that's now building more laptops than any other company in the world, and it has started shipping servers that use 64-core Tilera chips. Quanta's machine includes two boards, each with four chips, for a total of 512 cores in a 2U chassis. According to Quanta, the machine can replace a two-socket Intel Xeon server while consuming a quarter of the power per node (50 watts for a single-socket TilePro64 node compared to around 200 watts per Xeon node).
Quanta has also been known to build PCs for Dell, and it builds its own line of x64 servers.
Meanwhile, SeaMicro has introduced a 512-core, 10U server based on Intel's low-power Atom chip. The original SM10000 was based on a 32-bit Atom Z530 chip and SeaMicro is now shipping a version based on the 64-bit Atom N570 processor. The rub is that the Atom-based system board is limited to 4GB per server node, even though it has 64-bit addressing and can, in theory, address much more than that.
The case for Intel
When you consider this memory limitation – and that fact that Facebook has a close relationship with Dell – it's unlikely the social networking giant will embrace SeaMicro anytime soon. Tilera machines could happen, but there are obstacles here as well.
With most hyperscale data-center operators, such as Google, Facebook, Yahoo!, and Microsoft, the idea is to standardize on a very limited number of servers with precise configurations. By standardizing hardware, workloads can move more fluidly across machines. Plus, support is easier. For similar reasons, telecommunications companies have for decades rigidly standardized their switching equipment, often ignoring new technology improvements in favor of simplicity.
Given this, Facebook may be hesitant to introduce a new instruction set from someone like Tilera. The economics of the platform would have to be dramatically better than the x86 architecture. It would require corralling a portion of Facebook's workloads on Tilera machines and making them less portable and usable on the rest of its servers.
That said, if Facebook has a new workload in mind, a workload that isn't going to run across all data centers anyway, a Tilera architecture might work just fine. Let's say, for instance, that Facebook wants to launch a, well, Google-battling search engine. A data center of Tilera boxes might be better at this particular task than its existing x86 machinery in terms of performance per watt or dollars per performance per watt.
But we're guessing that Facebook is moving to Intel micro servers. With a micro server, you put four sticks of memory and a single socket onto a small motherboard and then cram as many as you can into a rack chassis that has shared power and dedicated disks for each server node. This is not to be confused with a blade server. It doesn't have a shared backplane for linking to networking and a central management coprocessor in the blade chassis. And if we're lucky, micro servers will adhere to the SSI Forum standard, so that you can mix and match models from disparate vendors. Blade servers never developed a standard.
At Intel's press event, Coglitore said that Facebook would not move to a 32-bit memory footprint, which limits you to 4GB of memory per server node. This, he said, does not make sense for existing Facebook workloads. That precludes the use of SeaMicro's Atom-based machines or any ARM processor. But the second generation of Tile-Gx processors from Tilera have 64-bit addressing, so this is at least a possibility. And the single-socket Xeon E3 chip offers 64-bit memory addressing, so it can support 64GB of memory in four slots using 16GB memory sticks.
Brawny v Wimpy
Facebook has long leased data center space from third-parties, but in January 2010 it announced that it would build its own facility in Prineville, and the primary aim of the facility is to improve efficiency.
"We are designing a facility that will be highly efficient and cost-effective for our operations today and into the future," Jonathan Heiliger wrote in a blog post at the time. "As our user base continued to grow and we developed Facebook into a much richer service, we reached the point where it was more efficient to lease entire buildings on our own. We are now ready to build our own."
Facebook has said it will use outside air and evaporated water to cool the facility, forgoing chillers. But it only stands to reason that the company will seek to improve efficiency at the system level as well.
The company is following in the footsteps of Google. Mountain View now runs around 36 custom-built data centers across the globe, including at least one that runs without chillers, and it has long built its own servers and racks in an effort to improve efficiency.
But Google uses nothing but commodity hardware, including standard server chips from Intel and AMD. And in a paper published last year, Google senior vice president of operations Urs Hölzle warned against taking parallelization too far, downplaying the benefits of extreme multi-core processors. He said that chips that spread workloads across more energy-efficient but slower cores may not be preferable to chips with faster but power-hungry cores.
"Slower but energy efficient 'wimpy' cores only win for general workloads if their single-core speed is reasonably close to that of midrange 'brawny' cores," he said, warning that wimpy cores run into Amdahl's law, which says that when you parallelize only part of a system, there is a limit to performance improvement.
"So why doesn’t everyone want wimpy-core systems?" Hölzle wrote. "Because in many corners of the real world, they’re prohibited by law – Amdahl’s law. Even though many Internet services benefit from seemingly unbounded request- and data-level parallelism, such systems aren’t above the law. As the number of parallel threads increases, reducing serialization and communication overheads can become increasingly difficult. In a limit case, the amount of inherently serial work performed on behalf of a user request by slow single-threaded cores will dominate overall execution time."
In publicly backing Tilera and SeaMicro, Facebook's Heiliger seems to take a different view. But we find it hard to believe the company will go that far. But stranger things have happened. And the company has said it's introducing a software change as well. One thing's for sure: it wouldn't even be semi-accurate to say the company is moving to ARM. ®