SeaMicro adds Xeons to Atom smasher microservers
Now sporting brawny and wimpy cores
For the past 18 months, SeaMicro, the upstart maker of microservers that are based on Intel's Atom processors, has heard from x86 competitors trying to keep it out of hyperscale data center server deals that the Atom cores are too wimpy to do heavy lifting workloads. But now, the SM10000 line of microservers – actually more like a supercomputer cluster of minimalist microservers – is getting the brawny cores in Intel's Xeon E3 family of server chips, closing the gap with x86 alternatives and removing a big barrier to adoption for its machines.
The new SM10000-EX line is based on the same basic chassis as the three prior generations of SeaMicro machines, which have a chassis that supports 64 processor cards with varying numbers of 32-bit or 64-bit Atom processors – and now 64-bit Xeon E3s – in a 10U chassis. Andrew Feldman, CEO of the server maker, tells El Reg that the new machine will burn half the power, offer three times the density, and provide twelve times the bandwidth of other "Sandy Bridge" class x86 server platforms.
"To give you an idea of how far we have come, our 10U system could replace 500 single-socket servers from five years ago," brags Feldman. And 20 of these new machines, Feldman adds, could have run online retaining giant Amazon for the first five years of its existence, when it grew to be a $2.7bn company. (By the way, that's a coded message. James Hamilton, a vice-president and distinguished engineer at Amazon Web Services, the cloud computing juggernaut, has said  that in 2011, AWS installed as much server capacity each day as Amazon had in place when it was a $2.7bn company in 2000. Hamilton said that just rolling in this number of servers each day is a challenge.)
The SM10000-EX won't run just any old Xeon, or even any Xeon E3 model. "We looked long and hard for the right CPU to attack bigger workloads, and a 2.4GHz Sandy Bridge core is a brawny core," says Feldman. "It's a boulder crusher." And, Feldman adds, the E3-1260L is a low-voltage part in the E3 family, which means it is the most power efficient Xeon in that line, and the E3 has power efficiency advantages compared to the forthcoming Xeon E5 processors, due in the first quarter for two-socket servers, and the current Xeon E7 used in four-socket and eight-socket machines. While the Xeon E3 chips are limited to a single socket per server node, they don't have all the extra SMP and NUMA electronics needed to provide cache coherency across multiple sockets, which burns up juice.
Jason Waxman, general manager of the the Cloud Computing Group at Intel, tells El Reg that a SeaMicro server node using the quad-core Xeon E3-1260L processors will have about twice the SPECint_rate2006 integer performance as cards with six dual-core Atom N570 processors and the related NM10 Express chipset used in the SM10000-64HD configuration  announced last July. The Xeon cores, along with larger main memories per node, the extra integer oomph is exactly what hefty Java and PHP applications or even SQL database and NoSQL data store workloads require.
Intel was not exactly thrilled about the prospect of companies deploying microservers using Atom processors, at least not initially, but it is coming around to the idea as companies like SeaMicro push it and in preparation for the onslaught of ARM-based servers, which will probably take off in earnest late this year and early next.
"SeaMicro is pretty modest," says Waxman. "They were really the first company to push us hard on the Atom, and they are the first to develop a system that supports both Atom and Xeon."
SeaMicro's SM10000-XE chassis packs 64 nodes in 10U of space
SeaMicro is eager to slide in future "Ivy Bridge" Xeons and future "Medfield" Atom system-on-chip designs (which are expected to come with tweaks specifically for servers) when they become available. "We're looking forward to cranking out 22 nanometer Xeons and that Atom SoC," says Waxman, without mentioning Advanced Micro Devices or the ARM collective, which is going to be gunning for both processors to try to steal away server, workstation, and PC business.
In addition to supporting the Xeon E3 processors, the SM10000-EX line has some tweaks to the 3D torus interconnect that links server nodes in the chassis into a single, shared fabric that in turn links to virtualized disk I/O and virtualized external Gigabit or 10 Gigabit Ethernet network ports. The new iteration of the interconnect, code-named "Freedom", has a new feature called TIO, which is short for Turn It Off. What this feature does is simple, but very important in hyperscale data centers where power consumption is as big of a problem as paying for physical servers.
New SeaMicro fabric shuts up Xeon chips
The Freedom interconnect has the same 1.28Tb/sec of aggregate bandwidth on that 3D torus as the prior generation, and it can support up to 16 external 10GE links or 64 external Gigabit links as well. That hasn't changed. Each socket in the Xeon machine obviously gets more bandwidth into and out of the interconnect, since there are 64 sockets across that 1.28Gb/sec instead of 384 with the most capacious Atom machine that SeaMicro sells.
But with the Xeon version of the SeaMicro box, engineers had to hack functions into the interconnect that could reach into the Intel C200 chipset and the Xeon E3 chip and turn off all the junk on this Sandy chip, which is also used for workstations and, in a slightly modified form, in high-end PCs, and start shutting things down. So, for instance, with the disk I/O all virtualized by the SeaMicro interconnect, there's no need to power up the SATA interface on the Intel chipset. Ditto for the USB ports on the chipset or the HD 2000 graphics controller on the Xeon E3 itself.
The upshot, says Feldman, is that the SeaMicro motherboard runs an E3 chip for server workloads more efficiently than other motherboards will be able to, since they cannot power down these features. A fully loaded SM10000-EX with 64 Xeon E3-1260L processors with 32GB of DDR3 main memory each consumes somewhere between 3.2 and 3.5 kilowatts running heavy workloads, according to Feldman. So a rack with 256 nodes and 1,024 cores weighs in at around 12.8 to 14 kilowatts.
That Xeon E3-1260L chip has 8MB of on-chip L3 cache and a a maximum TurboBoost speed of 3.3GHz. It also supports HyperThreading if your workload is amenable to simultaneous multithreading. It has a thermal design point of 45 watts.
SeaMicro is focused like a laser on performance per watt, and that is why it is not supporting other Xeon E3 chips. It is also why the company is teaming up with Samsung Electronics to make use of a SODIMM memory module based on 1.35 volt DDR3 memory chips made in Samsung's 30 nanometer processes that is based on 4Gb memory chips and that burns about a quarter of a watt per gigabyte. Using regular 1.5 volt chips based on 1Gb memory chips etched in 50 nanometer processes burns about 1 watt per gigabyte, and moreover, the registered DIMM memory sticks are about twice as large and not particularly well suited to a dense-packed multinode cluster like the SeaMicro SM10000-XE.
The SeaMicro system is more than a bunch of server nodes linked by a 3D torus interconnect. That home-grown torus interconnect ASIC also virtualizes disk access and Ethernet networking for each of the Atom and now Xeon servers in the chassis. The system also has a field programmable gate array (FPGA) to do load balancing across all server nodes, and these circuits are hooked into the SM10000's system management tools to allow for pools of servers to be grouped together and managed as a single object and to provide guaranteed performance levels for groups of processors, disk, memory, and fabric. The fabric doesn't have a name, but this function performed by the fabric is called Dynamic Compute Allocation Technology, or DCAT. The SeaMicro chassis supports up to 64 SATA disk or solid state drives and Gigabit Ethernet (64) or 10 Gigabit Ethernet (16) uplinks out of the back of the box to reach the outside world. The server cards plug into the SM10000 chassis and backplane through two PCI-Express 2.0 slots. The backplane currently supports 128 of these PCI slots for hooking in server boards. (Two columns of 32 slots on each side of the chassis.)
The SM10000-XE, fully loaded with 64 quad-core Xeon processors and 2TB of memory, has a list price of $138,000. The current list price of the SM10000-HD64 is $159,000 with 384 dual-core Atoms.
SeaMicro is privately held and does not report revenues, but Feldman says that sales during its first full year of business are better than Juniper Networks, Riverbed, 3PAR, and a bunch of other startups during their first full year of biz. "We are selling these boxes absolutely as fast as we can make them," says Feldman. "We revised our forecasts upwards twice last year and beat them."
There's some interesting future possibilities for the SeaMicro machines. First, SeaMicro could extend that torus interconnect to span multiple chassis. Second, it could put a "Patsburg" C600 chipset on an auxiliary card and actually make fatter SMP nodes out of single processor cards and then link them into the torus interconnect. Finally, it could of course add other processors to the boards, such as Tilera's 64-bit Tile Gx3000s or 64-bit ARM processors when they become available. ®