Facebook knuckle-raps Intel, AMD
'The server vendors have failed us'
While social networking site Facebook doesn't seem inclined to pull a Google and build all of its own servers, the company's top techies are frustrated enough with the current crop of x64 boxes that they may be giving the idea some thought.
Speaking at the Structure 09 conference in San Francisco yesterday, Jonathan Heiliger, vice president of technical operations at Facebook, first expressed his disappointment with chip makers Intel and AMD and then server vendors in addressing the infrastructure challenges the company faces. (You can see a video of Heiliger's chat here).
"The biggest thing that surprised us - or is about to surprise us - is the less-than-anticipated performance gains from new microarchitectures," Heiliger explained in a Q&A session with GigaOM's Om Malik, referencing the new chips from Intel and AMD in particular. "The performance gains they are touting in the press, we are not seeing in our applications. We are literally in real-time trying to figure out why that is and if there are optimizations that we can do. Otherwise, we are kind of left with current-generation technology and current-generation scale."
Of course, it's unlikely that Facebook will return to using systems from one or two generations ago. But the situation may point out that the performance gains that Intel and AMD are showing on tests may not translate well into the heavily customized Facebook stack, which is written in PHP and uses memcached as a temporary cache store for data serving up its pages and MySQL as a permanent store for that data behind the caching servers. Or - and this seems more likely - it may simply point out the limits of the open source software Facebook has chosen to run its site.
Both MySQL and memcached are not particularly good at scaling on many cores and threads, which is why companies like Facebook scale horizontally in the first place. The scalability issue is why a number of vendors have come out recently with multithreaded versions of memcached (the appliance made by Schooner, which launched in April comes to mind), and it is also why Sun Microsystems has been touting that its future MySQL 8.4 (or rather, Oracle's future MySQL 5.4) will scale across 16 processor threads, whereas the current MySQL 5.1 only spans four threads.
With the new two-socket x64 boxes having 12 threads for an Istanbul Opteron and 16 threads for a Nehalem EP Xeon, it should not be all that much of a surprise to Heiliger that new iron is not making Faceook's caching and database tiers run faster. The software can't use the threads. And the only reason that MySQL 5.4 will be able to span up to 16 threads is because Google is contributing code to the MySQL project. (I have no idea how PHP is or is not making use of all those extra threads in two-socket Nehalem EP and Istanbul servers).
With regard to the other pet peeve that Facebook has about its server infrastructure - power consumption - Heiliger heaped a certain amount of scorn on the world's server makers, who are no doubt constantly knocking on his door to try to win the next 10,000 server deal that Facebook does.
"I am not sure whether to be embarrassed or pleased for the OEM and system vendors in the audience," Heiliger said, "but you guys just don't get it." Heiliger said that it was not enough to just have servers with low-voltage chips and more efficient power supplies, but that server makers had to do what Google has done, which is to trim out every watt it can and minimize the wall power that a server consumes as it does its work.
"I am not sure why the server vendors have failed us," Heiliger added when asked why Facebook wasn't getting the machines it really wanted to buy. But he conceded that there is a bit of a chicken-and-egg problem in that server makers have to design for the belly of the market and that they have come around to doing custom designs for hyperscale customers like Facebook for orders of tens of thousands of nodes at a time.
Heiliger's own advice for providing scalability - the kind that a hyperscale company like Facebook needs - was simple, and it is fair to say that maybe Facebook needs to take its own advice a little more literally to be happy. "Pick an area where you are going to own - whether it is the application or the software infrastructure or the hardware infrastructure - and start with that area," Heiliger advised. "As you grow and as you scale, you can add the other two. You can start anywhere, but I would probably recommend starting with the application first."
Following this advice, it would seem that Facebook needs to buy lots more new x64 servers and to get down into the guts of its open source code and make it more scalable. Or design its own machines that give the best performance per watt on its existing code using older or at least different iron. Might I suggest some racks of Nano or Atom servers to start? ®
Architecture, not Open source
Or redesign your software architecture. Facebook isnt that hard a concept all up. But if you treat it like your typical 3 page website multiplied by 10 million times, expect to have an architecture that looks like a massive bloated shared hosting site.
Tim, stop bagging open source. They could have chosen inadequate proprietary software and achieved the same poor result. I did hit the "get more from this author" button, and this article didnt improve. Its not open source that's the problem, its the choice of technology that meets a scale that facebook never thought they'd ever get to. All those guys praising google biscuit trays neglect to say that their pizza boxes are built to support scalable software architectures, using map-reduce and other tricks of the trade. Yahoo use Hadoop to get some of the same scalability and others are working with concurrent languages like erlang or stackless python and abandoning relational databases for more scalable structures storage structures.
There is plenty of good open source that assists users to get scale, PHP and MySQL solve lots of problems and have been effective at scale. Get the architecture right first for the problem at hand. Otherwise the cost is what Facebook faces now, lots and lots of underperforming boxes.
Virtualise or make "slimmer" servers?
At the moment, the choices with such apps seem to be either stick it in a VM and pack more VMs onto a faster server, or make specialised servers like the Google biscuit-tray type. Given the cost that is involved in producing new designs, until there is a substantial market I don't think we'll see any Atom servers from the main vendors. Which really winds me up as it would really simplify some of our infrsatructure if I could put in some low-power Atom blades. Why x86? Because it means I can re-use existing x86 binaries, be they Linux or Windoze. And it always seems much more expensive in time and effort to change the app than to change the hardware.
A couple of ideas
Not my area of expertise, but ideas. If scaling horizontally doesn't run them into some other limit any time soon, here are a couple of ideas for them:
1. Quit using fast boxes and concentrate on maximising crunch per watt. This might point them to buying a vast number of (blade?) Atom servers. Not at all fast, but low power consumption. I can't remember who sells such but I'm sure I read about at least one such system.
2. Keep the fast hot boxes and give VMware a call (and/or one of the competitors). Buy enough extra RAM to split sixteen cores into (say) eight virtual machines with two cores each, or sixteen VMs with one core each. This basically takes some of the CPU's multithreading ability that they can't use up to the hypervisor level, which can use it effectively to multithread multiple VMs.
In passing: VMware enterprise stuff includes the ability to hot-move VMs between physical boxes, and to shut down / reboot boxes when load on the virtual machines drops such that not all the physical boxes are needed. Big power savings here?
If scaling horizontally CAN'T provide a long-term solution they're just going to admit that their current system architecture can't scale up enough, bite the bullet, and re-engineer everything. if so, here's wishing then luck!