Original URL: https://www.theregister.com/2014/03/05/facebook_lab_tour/

Inside Facebook's engineering labs: Hardware heaven, HP hell – PICTURES

Better duck, Amazon... Hardware drone incoming

By Jack Clark in San Francisco

Posted in Channel, 5th March 2014 11:01 GMT

10 years of Facebook Facebook's hardware development lab is either a paradise, a business opportunity, or a hell, depending on your viewpoint.

If you're a hardware nerd who loves fiddling with data centre gear, ripping out extraneous fluff, and generally cutting the cost of your infrastructure, then the lab is a wonderful place where your dreams are manufactured.

If you're an executive from HP, Dell, Lenovo, Cisco, Brocade, Juniper, EMC or NetApp, the lab is likely to instill a sense of cold, clammy fear, for in this lab a small team of diligent Facebook employees are working to make servers and storage arrays –and now networking switches – which undercut your own products.

Meanwhile, if you represent Asian component manufacturers like Quanta and Foxconn and Wiwynn, you're likely to relish a trip to the lab, as it is here that Facebook designs its "Open Compute" gear, the designs of which it eventually makes available to the wider community. When these designs are published, the Asian companies are usually the ones selling the designs – at the cost of the profits of HP, Dell, Lenovo, and so on.

Obviously, El Reg had to take a tour and so earlier this month we took the Caltrain down to the social network's headquarters to ask Facebook some questions, the first of which was: what business does a social network have designing and building its own data centre hardware?

Lots, it turns out.

Though Facebook may seem like a trivial app in itself, the scale at which it operates – over a billion users, tens upon tens of petabytes of storage, three data centres around the world (and one in construction) each containing (El Reg estimates) hundreds of thousands of servers – means that it has had to rethink how it buys and consumes hardware to keep costs down.

The main insight the social network has had is that its workloads can fit into about five different types, and therefore it only needs to have five different server variants across its mammoth fleet.

These SKUs tend to be limited by a single bottleneck, with RAM or flash capacities the stumbling point for database servers, HDD capacity for photo servers, CPU speed for Hadoop gear, and so on.

"The primary driver for evolution in our hardware SKUs is the primary component," explains Facebook's director of infrastructure Jason Taylor.

fbl4s

One of Facebook's distinctive Open Compute sled servers

For this reason, Facebook has always had a clear motivation to design servers that can be easily upgradable, without needing to either remove them from the data centre or perform complicated maintenance. This has led to its design of a sled-based server (pictured) that makes quick maintenance possible while keeping costs down.

"The thing that we think about most across all of our hardware is what is the critical bottleneck we're building for," Taylor explains. "I think that for really the last four years or so we've been really good at being one of the first adopters of a new piece of tech when there's a significant change."

Where dreams are made

One of the main ways Facebook has achieved this speed is with its hardware lab, which allows it to refine existing designs and come up with new chassis to take advantage of different technologies.

By encouraging experimentation, Facebook lets its employees rapidly prototype ideas, allowing them to rethink how they arrange and configure hardware as they come up with use cases specific to the social networking giant.

fbl4s

A server development board being tested

As Facebook develops its servers it will order in development boards (pictured) from hardware partners to help it test the hardware. Sometimes it will let people curious in adopting Open Compute Project designs test the boards themselves, though from what we understand this isn't an official policy.

All aboard the sushi boat

The "Sushi Boat" server (below) is an example of why Facebook is confident that its software development mantra of "move fast, break things", has relevance in hardware as well.

fbl6s

This "Sushi Boat" server can fit in up to 80 SSD cards

The server (pictured), was put together by a team of Facebook engineers during one of the company's hackathons after they found that the company had a large amount of 2.5-inch laptop SSDs lying around.

Using a combination of balsa wood, cardboard, and briefly a Makerbot 3D printer, the team was able to mock up a prototype server which can cram in up to 80 SSDs.

"They designed the I/O system, the whole thing," explains director of hardware engineering Matt Corddry. "It's a really neat design."

fbl5s

The Sushi Boat drives are loaded via little containers

What stunned Facebook was that when it came to evaluate the design according to how much power it consumed, how much it would weigh, and how many bits you could get, it found that it was a much better proposition than Facebook's existing "Cold Storage" servers.

"All the SSDs for the system were horribly engineered," explains Taylor. "If we were to then take the expectations we have for archival storage and map onto SSDs you could get a much higher bit density and lower performance. An SSD solution for archival storage is not at all absurd."

Though Facebook has no immediate plans to deploy this into production, Corddry did say the design would be "a backpocket thing for us," and Facebook could use the prototype to create a production server at short notice.

It's this combination of imaginative design and flexibility which makes the social network think it makes sense to operate a hardware lab.

Big Tin

Though much of Facebook's work involves engineers working with other equipment vendors to get components built to a specification, it does some detailed manufacture itself.

fbl7s

It's not all motherboards and chassis – Facebook's engineers also work with this 1950s metal milling machine

In another building at the company's HQ, El Reg found a 1950s metal mill next to a top of the line Fortus 250mc 3D printer.

fbl8s

This Fortus 250mc printer lets Facebook rapidly prototype designs

These machines are both used to rapidly prototype bits of equipment. The 3D printer, for instance, is linked to Facebook's network so that engineers can order up print jobs from their desk.

"This is how you move fast - they have an idea and can model it out and see it in action," Corddry says. "For what we do [3D printers] are magical things for early in the prototype and hacking phase. Even little details - clips, tension mechanisms, you can mock them up. If your world is hardware, they're marvelous."

Garage band: 30 drives, one slim container

One good example of this approach yielding something of real value to the giant is found in "Knox" – an advanced storage chassis that allows Facebook to cram 30 drives into a svelte container.

fbl11

An early version of Facebook's 'Knox' system

As befitting Facebook's location in Silicon Valley, the Knox prototype (pictured), was first put together in the garage of Facebook engineer Jon Ehlen, he tells us via email.

fbl9

By prototyping the design, Facebook was able to test the slide-out hard drive mechanism

This prototype used "plywood as the server tray, a 50lb weight to simulate the total weight of the device, and surfboard foam (pictured) to mimic the dimensions of the hard drives," he explained via email.

By building the prototype, the engineers were able to test out the characteristic Open Compute Project sled design, Ehlen says.

fbl10

A later version of the prototype saw Facebook work with an ODM to mock-up the final design

As the design progressed, the engineers adopted other materials until they progressed from a wood mock-up to a full sheet metal prototype (pictured) manufactured with an original device manufacturer, as seen here.

"This was still "hacked" insofar as the design was very rough, and a Facebook mechanical engineer worked on-site at the sheet metal factory to modify and create new parts as they came off the sheet metal presses," Ehlen said.

The end result was Knox: a storage array now in production use at Facebook's Prineville, Oregon storage facility.

But for all the apparent usefulness of the hardware lab, it also seems like an ad-funded mecca for hardware fondlers. During our visit, Corddry told us about some of the company's less successful hacks, and said that about 10 months ago some engineers built a quadcopter strong enough to carry a hard drive around. "There was a quadcopter with a hard drive attached to it flying all over the campus late one night," Corddry said. Eat your heart out, Amazon. ®