Mystery startup uncloaks 512-core server
Atom bomb, network included
The mystery behind secretive server startup SeaMicro is dispelled today as the venture-backed maker of what it has been calling "data center appliances" unveils its first product: the SM10000, a server cluster comprised of 512 of Intel's Atom processors with a built-in, virtualized network fabric for the servers.
The SM10000 passes the TPM Server Test of having an elegant design: mainly, I want one, and I am not even sure why. I'll figure out what to do with it later. Probably something stupid, like turning it into a giant MapReduce box that uses log tables instead of floating point math units to do calculations, just to see if that would work. In recent years, the server designs from Fabric7, Liquid Computing, and 3Leaf Systems have all passed this test, as did the original "SledgeHammer" Opteron chips from Advanced Micro Devices and their clones, the "Nehalem" processors from Intel from last year.
Ditto the Nvidia Tesla 20 GPU co-processors, the Power7 IH supercomputing nodes used in the "Blue Waters" super, and some of Sun Microsystems very elegant Sun Fire designs from a few years back; so, too, for many Mini-ITX, Nano-ITX, and Pico-ITX system boards for homemade, low-power servers. Clearly, passing the TPM Server Test doesn't necessarily lead to riches, so it is of dubious value.
SeaMicro has obtained $25m in venture funding from Khosla Ventures, Draper Fisher Jurvetson, Crosslink Capital, and an unnamed private backer. The company was also, you will recall, one of the vendors that received a slice  of a $47m grant in January by the US Department of Energy to come up with some greener technologies for the data center.
SeaMicro got the second-biggest slice of the DOE money, which was part of the $787bn Obama administration stimulus package, landing a $9.3m grant to field test a machine that puts hundreds of low-powered servers into a single box. SeaMicro said it could cut power consumption by 75 percent compared to x64 alternatives in its proposal. The rumor last fall was that SeaMicro was working on a server that would cram as many as 80 processors, perhaps Intel Atoms, perhaps ARM RISC chips, into a single chassis with a direct mesh fabric. The mesh is correct, but the processor count is way low.
The SM10000 does not have 10000 cores, as the name might seem to suggest, but does put 512 individual servers based on the single-core Atom Z530 processor into a 10U chassis, which is a neat trick. And one that the techies who used to work at AMD, Cisco Systems, Force10 Networks, Juniper Networks, and Sun were able to pull off.
SeaMicro was founded in July 2007 by Andrew Feldman, who formerly headed up marketing at Force10, and Gary Lauterbach, an AMD chip designer who was also responsible for putting together Sun's UltraSparc-III and UltraSparc-IV processors. Feldman and Lauterbach looked at the modern, hyperscale workloads that were starting to take over the data centers of the world and came to the conclusion that the complex x64, RISC, and Itanium processors - well suited to deal with predictable workloads solving complex problems within a single company's application mix in a predictable and orchestrated fashion - were wickedly unsuited for the relatively simple, but massively-scaled big data jobs that companies want to run efficiently and cheaply.
"The reason why power is not an issue is that workloads have changed in the data center," explains Feldman. "Now companies have smaller workloads, and they are bursty in nature. And today's systems are particularly bad because they have all these feature that suck power - out of order speculation branch prediction, and so forth - that are not particularly useful for these kinds of new workload and that consume lots of power. The end result is that we are taking the Space Shuttle to the grocery store."
So SeaMicro looked at all kinds of low-powered, relatively simple processors that it might base its data center appliances on, including VIA Technologies' Nano, low-voltage x64 parts from Intel and AMD, and even ARM processors commonly used in handhelds and cell phones. While SeaMicro thought the future "Bobcat" processors from AMD were interesting, they would not get to market in time, and among the Nano, ARM, and Atom alternatives, Feldman says that the single-core Atom offers the best bang for the buck and the added benefit - some might say absolute requirement - of compatibility with the x64 architecture. By SeaMicro's reckoning, on Internet-style workloads - search, Map/Reduce and Hadoop, social networking apps, and such - the Atom core offers about 3.2 times the performance per watt of a Xeon or Opteron core. And the box can run Windows or Linux applications unchanged.
Getting the right CPU for the job was only one third of the battle, however, because in a modern server, processors only account for about a third of the total power consumption. Chipsets, memory, networking (including on-server network ports and the external switch), peripheral I/O account for the other two-thirds of the juice that gets sucked out of the wall. And so SeaMicro created what is in essence a supercomputer interconnection fabric that also virtualizes the memory and I/O for tiny Atom-based servers, many of which are crammed onto a single motherboard, with many of these mobos plugged into the fabric using plain old PCI-Express links.
That backplane virtualizes the networking and I/O for each Atom server and also includes an integrated switch, a load balancer, and a terminal server for all the servers in the box. This really is a single box compute cluster, and it also has room for integrated disks.
The secret sauce in the SeaMicro design is an ASIC chip that virtualizes disk access and Ethernet networking for each of the Atom servers. The ASIC also implements a 3D torus interconnect between all of the server nodes, which is similar to the interconnect that IBM developed for its BlueGene massively parallel Linux supercomputer and which delivers 1.28 Tb/sec of aggregate bandwidth across the 64 server motherboards and 512 cores inside the SM10000 chassis.
SeaMicro also came up with its own field programmable gate array (FPGA) to do load balancing across the machines in a very efficient manner. The load balancing electronics are hooked into the SM10000's system management tools to allow for pools of servers to be grouped together and managed as a single object and to provide guaranteed performance levels for groups of processors, disk, memory, and fabric - something that Feldman says virtualized x64 servers cannot do because they often oversubscribe resources to drive up utilization. The name for this capability is called Dynamic Compute Allocation Technology, or DCAT.
The combination of the ASIC and the FPGA removes 90 per cent of the components in a normal server stack, according to Sea Micro. Such that a 10U chassis with 64 SeaMicro server boards can replace 40 1U, two-socket x64 servers, two Gigabit Ethernet switches, two terminal servers, and a load balancer - what you would cram into a standard 42U rack these days running hyperscale, Webby workloads. And the SM1000 will draw one quarter of the power and therefore require one quarter of the cooling.
A server the side of a credit card
The basic unit of computing in the SM10000 server cluster is an Atom machine with four components: the Atom Z530 processor, which runs at 1.6 GHz and which has two threads for execution; the "Poulsbo" US15W chipset; the SeaMicro ASIC, for virtualizing I/O and implementing the fabric; and a SODIMM memory slot. This server is about 2.2 inches by 3 inches, with the memory module on one side and the other components on the other. That's reducing a server from the size of a pizza box to the size of a credit card. Here's how they lay out on a single SeaMicro server board:
The SeaMicro SM10000 server board.
As you can see from the picture above, the SeaMicro SM10000 server board has eight Atom servers (one chip and one chipset) on a single printed circuit board. The smaller chip is actually the processor and the larger, darker chip is the chipset. The four ASIC chips that virtualize the I/O and implement the interconnect are along the bottom, and SeaMicro has designed the mobo so it links back into the chassis using two absolutely standard PCI-Express 2.0 x16 slots, side by side. (Let this be a lesson to you proprietary blade server makers with you non-standard backplanes and interconnect electronics). This board measures 5 inches by 11 inches.
The SM10000 chassis has 128 PCI-Express 2.0 x16 slots, arranged in eight vertical columms, four on the left and four on the right of the chassis. You plug in 32 boards (two columns of 16) on each side to get your 512 Atoms per chassis. Like thus:
The SM10000, front and side view.
With each Atom server having its own 2 GB SODIMM, the chassis supports up to 1 TB of main memory across the 512 server nodes. The chassis has room for up to 64 SATA or solid state disk drives in the front (you always pull cold air over disks, so they need to be in the front). The disks and server boards are plug and play, so you don't have to reboot to add capacity. The servers need to talk to the outside world, of course, so the homegrown networking fabric and switch created by SeaMicro for the SM10000 has uplinks, which you can see here:
The back-end of the SM10000 server chassis.
The chassis has different network modules, which offer 8 to 64 Gigabit Ethernet uplinks or 2 to 16 10 Gigabit Ethernet uplinks per chassis. The FPGAs implementing the load balancer and terminal software as well as the switching software are in the chassis.
The whole box burns under 2 kilowatts of juice running real workloads, which is a quarter of the power that a rack of two-socket x64 boxes will do.
The SM10000 will be available on July 30, with a base configuration running $139,000.
By the way, there is nothing about the SeaMicro architecture that precludes the company from supporting whatever processor architecture it wants. If someone wanted a bunch of servers based on ARM processors and was willing to pay for it, you can bet that SeaMicro could build it. Ditto for protocols and ports coming off the interconnect fabric. The architecture can support Fibre Channel or converged enhanced Ethernet, which allows for Fibre Channel to be run over 10 Gigabit Ethernet.
For now, Feldman says that SeaMicro is looking ahead to a time when Intel puts an entire Atom as well as its chipset, memory controller, and other goodies on a single piece of silicon. At that time, SeaMicro should be able to get a lot more servers and cores onto a single SM10000 system board. And the company will also eventually be able to link multiple SM10000 chassis together for integrated management, like stackable network switches do today.
The SM10000 took three years and many millions of dollars to develop and could be very quick (a lot depends on the software), but is nonetheless a complete unknown. Not the kind of thing that engenders any new technology to large, conservative customers. But the issues in power and cooling are so bad at many hyperscale data centers that enthusiasm for the SM10000 product, which has been rumored since last summer, was quite high ahead of the launch.
"We have big orders," says Feldman, with a laugh. "And we have a good-sized backlog."
This might actually be a machine that Google buys instead of making itself. We'll see. ®