Facebook says vendor secrets forced it to homebrew switches

Network Veep spills how switch vendors keep diagnostic tricks to themselves

FBNetworkdiagram

It's four months since Facebook first launched its Wedge switch and accompanying FBOSS operating system. Some forms of Wedge are in production and others are in testing, so El Reg decided to talk with Facebook's VP of network engineering, Najam Ahmad, to see where The Social Network is at with its software-defined networking (SDN) efforts.

As readers will remember, Facebook's decision to put forward its open network and SDN credentials came in June, with a promise that the results would become part of the Open Compute Project.

While it's attractive to attribute the motivation for the initiatives to raw cost, Ahmad told The Register that elusive quality called “agility” was at least as important in the decision to pursue home-baked SDN. Unlike many of the people that deploy the term, Ahmad was also willing to talk through an example (without, it must be said, pointing the finger at a particular vendor).

What hamstrings the “vertical network vendors”, he said, is where a fault touches too closely on the crown jewels of their intellectual property.

“One reason we started driving this was that in our memcache environment, we were seeing small but consistent failure rates across our data centres,” he explained.

“We did troubleshooting for about three weeks, but couldn't figure it out – and we had a bunch of smart people trying.”

How to design a kludge*

It was only when a developer from the switch vendor was on site that diagnostics started to emerge – because, unlike anyone in Facebook, the vendor engineer was able to log into the ASIC driving the switch to access its diagnostics and discover that the chip was causing packet loss.

“How would we know that?” Ahmad asked rhetorically. “There's no counter, and the command is hidden.”

The fix wasn't just slow, he told El Reg, it was an out-and-out kludge: with access to the secret command, a Facebook engineer had to write a script that logged into every individual ASIC in the data centre, run the secret command, gather the data via screen-scrapes, and parse the screen-scrapes so the data could be analysed.

“That took about three weeks, and in the end it was a kludge,” he said.

Getting rid of the proprietary silicon, replacing it with switches built on merchant silicon (since people like Broadcom want their OEMs to make the chips sing and dance), and controlling those switches from a generic x86 server breaks into the world once dominated by the big vertical switch vendors, Ahmad told us.

Make a bot

That example – and it's one he has raised in other forums at varying levels of detail – is merely the tip of the iceberg, since “we learn new failure scenarios every day of the week”.

There's simply too much infrastructure to manage in anything like real time, Ahmad said. “Our philosophy is that we want robots to manage the networks, and we want people to build the robots.”

The human management model – which has been Network Management 1.01 since the days of SunNet Manager – has people watching alert consoles or getting paged by the system, identifying the device that's gone dark, logging in, troubleshooting it, mitigating the problem, then returning the device to service.

Facebook's approach is to send alert to software that analyses it, and analyse the impact of the alert,.

If there's no immediate impact (for example, if it's one Ethernet port serving one of a dozen load-balanced servers), the robot will open a ticket without human involvement. Site services will then respond to the ticket in the own time.

Whereas if there is an impact – that's when the robot will raise a human immediately.

However, those robots can only be written if a vendor exposes enough of its APIs – or if the switch is open from the start. If a new failure mode is discovered, “we want to be able to build that robot immediately and deploy it”, rather than waiting six months for a vendor response.

The barest metal

All of this led The Register to wonder: what is the barest metal that can be deployed as a “bare-metal” switch, and still have something you can call a switch?

It starts with the capabilities in the merchant silicon packet processor, Ahmad answered. “On top of that, you need an operating system that allows you to manage the device and configure it, and a protocol stack so you can tell the chip what you need it to do.

“Those are the two components of software that are needed.”

In the Open Compute Project, he said, the goal is that the switch hardware has nothing but a box and a bootloader (the project adopted Cumulus Networks' ONIE for this purpose in 2012), and choose what operating system you load on it, whether it comes from vendors like Cumulus Networks or Big Switch, some other OEM, or it's a D-I-Y.

Linux provides the libraries and user processes needed to run the protocols a site needs, and the setup allows other capabilities to be built or contributed without depending on vendors' processes. Facebook, he said, has built a monitoring system that it's now in the process of packaging as a library for an open-source release.

“That's where Wedge is going today. The Wedge hardware was designed from ground-up – redesigned the motherboard as a modular system, with a lot of features/ functionality driving towards disaggregating and managing the devices, much like we mange servers,” he said.

Over time, he said, key components of the accompanying FBOSS operating system will also be open sourced. ®

*Bootnote: Yes, that was an intentional hat-tip to a seminal piece of computer industry humour first published in Datamation in 1962, and preserved here, among other places. Certainly, building a script to screen-scrape internal ASIC commands has the “certain, indefinable, masochistic finesse” that Jackson Granholm demands as the defining characteristic of the kludge. ®

Sponsored: Learn how to transform your data into a strategic asset for your business by using the cloud to accelerate innovation with NetApp


Biting the hand that feeds IT © 1998–2018