Calxeda boasts of 5 watt ARM server node
Includes memory and interconnect fabric
How does a server node including a processor, memory, and fabric interconnect that only consumes 5 watts under load grab you? How about 120 server nodes in a 2U chassis?
ARM server chip startup Calxeda, formerly known as Smooth-Stone, is lifting the veil a bit on its future processors, which are in development now. Calxeda is attending the GigaOM Big Data 2011 conference in New York next week and wants to start building momentum for its future chips, which it hopes server makers will pick up and make into servers in the coming years. Calxeda announced its name change and some vague information about its chips back at the SC10 supercomputer conference in November, and told El Reg it was working on a "server on a chip" design that would result in an ARM server being half the cost of an x64 box, with it using one-10th the energy and occupying one-10th the space.
Karl Freund, the vice president of marketing at Calxeda who came on board from IBM's System z mainframe division last year and who has done marketing for Hewlett-Packard workstations, Cray Research supercomputers – and IBM Unix boxes before that – tells el Reg that the Calxeda design is a bit different from the future server chips that are expected with the Cortex-A15 processors from ARM Holdings. It is also different from the variants that ARM licensees are expected to bring to the market in 2013 or so. Freund says Calxeda can't wait that long, and more importantly, that there is no reason to.
This kind of talk is what you would expect from Barry Evans, Calxeda's CEO, and co-founders Larry Wikelius and David Borland. Evans ran Intel's low-power x86 and Xscale ARM RISC chip businesses, while Wikelius designed chips and servers at Newisys and Borland has designed chips of various sorts for Marvell, Intel, and Advanced Micro Devices.
In August 2010, Calxeda had raised $48m in funding from ARM Holdings, Advanced Technology Investment Company (AMD's fab partner), Texas Instruments, Battery Ventures, Flybridge Capital Partners, and Highland Capital Partners. True to the David-versus-Goliath image of its original name, Calxeda definitely wants to throw stones, and especially at people who live in glass houses.
The precise feeds and speeds of the Calxeda chips are still not known, but Freund knows from his days of marketing Power systems at Big Blue that putting out some details can prep a market for consumption – as IBM most certainly did in the two years before its dual-core Power4 chips hit the market in the autumn of 2001. That Power4 chip and its successors turned IBM from a joke in the Unix server business to the dominant Unix system supplier a decade later.
"Anybody with a few million dollars can produce an ARM chip. So what makes us different?" Freund asks.
Not the core, that's for sure. Calxeda is starting with the Cortex-A9 core, which is the current 32-bit part that can be licensed from ARM Holdings. (You can license the 40-bit, virtualization-assisted Cortex-A15 part from ARM Holdings, too, but that won't be a production product for another two years or so.)
"What you don't want to do with a server chip is build your first product on a technology that is not proven yet," says Freund. "You have to wait for the design to settle down, as the Cortex-A9 has. If you're out on the front end of things, you can get into trouble."
Freund confirmed to El Reg that Calxeda is working on a quad-core Cortex-A9 processor with an integrated DDR3 memory controller and a homegrown fabric interconnect for the chips. The A9 core has integer and floating point units as well as a DDR2 memory controller and an L2 cache controller that can span from 128 KB to 8 MB, depending on what you want. Calxeda is adding DDR3 controllers and an unknown amount of cache. ARM Holdings' whitepaper on the Cortex-A9 chips suggests that L1 caches be set at 32 KB for instructions and 64 KB for data for networking and home gateways, with anywhere from 512 KB to 2 MB for L2 cache shared by the cores, and this is likely to be the shape of Calxeda's server chip. It is not clear if Calxeda will leave in the Media Processing Engine (MPE), but it seems likely that the floating point units will stay. Calxeda is putting a DDR3 memory controller on the chip for sure, and will be supporting ECC memory, of course, because this is a server, not a PC or tablet.
Calxeda is also cooking in that homegrown interconnect, which has yet to be given a name outside of the company. It is not clear how the Calxeda interconnect will hook into the Cortex-A9 chip, but that ARM design allows for two 64-bit Advanced Microcontroller Bus Architecture (AMBA) Advanced Extensible Interface (AXI) ports, with a combined 12 GB/sec of bandwidth into the system interconnect on the chip. It may be that Calxeda is interfacing a whole different protocol onto the chip – perhaps InfiniBand or 10 Gigabit Ethernet – right down on the chip, interfacing with the AXI ports. This would be the simplest and cheapest thing to do.
Because the Cortex-A9 is only a 32-bit processor, the Calxeda server nodes will top out at 4 GB of main memory per node. That is the upper limit of addressability for a 32-bit processor, of course, and in this case, it will be a single 4 GB stick of low-power DDR3 memory in a single slot.
Freund says that a quad-core A9-derived processor, plus its memory controller, the DDR3 memory module, and the on-chip fabric interconnect will burn only 5 watts. Clock speeds were not divulged, but it will probably be somewhere between 1 GHz and 2 GHz. That is less juice than a fat DDR3 memory stick uses, forget about the Intel or AMD x64 chip.
"This gives us extremely high levels of density," says Freund. And, the fabric interconnect will allow for "multiple thousands of cores" to be lashed together and controlled as a unit. (But not in a cache-coherent, shared memory manner. Don't get the wrong idea.)
The Cortex-A9 does not have any circuits to do virtualization, but Freund says that on the workloads that Calxeda expects customers to use the chip for, they won't need hypervisors to carve up the servers. The will already have parallelized workloads that span thousands of nodes that run at very high utilization rates. On an X64 server, you use a hypervisor to plunk multiple server images on one set of chips, workloads that might only consume 5, 10, 15, or 20 per cent of the raw CPU capacity by themselves, driving up utilization of the overall system.
That said, hypervisors and their control freak add-ons are also useful for managing workloads and spreading running workloads around a cluster of machines. Freund says that Calxeda is participating in the OpenStack cloud fabric effort to see how to adapt these tools to manage bare-metal images instead of virtual images on machines using its ARM variants. The Linux community is also working on software container technology for ARM chips, too, according to Freund, which could be useful for some workloads.
Calxeda is not going to make and sell servers, but rather make chips and reference machines that it hopes other server makers will pick up and sell in their product lines. The company hopes to start sampling its first ARM chips and reference servers later this year. The first reference machine has 120 server nodes in a 2U rack-mounted format, and the fabric linking the nodes together internally can be extended to interconnect multiple enclosures together.
The initial workloads that Calxeda is targeting include internet-scale web serving, of course, as well as streaming content delivery (so long as it doesn't need compute-intensive DRM), small web application hosting, storage controllers, and big data analytics.
"NoSQL and MapReduce are a beautiful fit for these servers because of the ratio of CPU, memory, and disk and the performance per watt," says Freund. ®