Calxeda plots server dominance with ARM SoCs
Prepping a MEEELLION-NODE fleet services enema for data centers
ARM server chip upstart Calxeda just bagged $55m in funding last week, and now we know what the company is going to do with the dough: plot a steady course to boost the performance of its ARM processors and the scalability of its on-die integrate Layer 2 distribute switch fabric until there is no reason to buy an x86 server chip.
Karl Freund, vice president of marketing at Calxeda, told El Reg that the funding would be used, in part, to do a follow-on EnergyCore chip (really a system on chip, or SoC, since it includes integrated switching) based on 40-bit Cortex-A15 cores based on the ARMv7 specs from ARM Holdings. The kit is due around this time in 2013, with its first 64-bit EnergyCore due around a year later in 2014 or so.
As it turns out, Calxeda's plans are considerably more ambitious than it was hinting last week. So much so that it might prompt in-state friend and foe Advanced Micro Devices to wonder why it didn't buy Calxeda instead of SeaMicro earlier this year. (There were unconfirmed rumors that AMD had in fact tried to do just that.)
Calxeda has raised $103m in two rounds of venture funding from Austin Ventures, Vulcan Capital, ARM Holdings, Advanced Technology Investment Company (which owns the GlobalFoundries chip fab), Battery Ventures, Flybridge Capital Partners, and Highland Capital Partners. It used some of that money to seed its initial EnergyCore design and will use the rest to significantly expand its reach – if it all goes according to plan over the next several years.
Calxeda is banking on the founders of the company – Barry Evans, who used to run Intel's low-power x86 and XScale ARM processor business; co-founder Larry Wikelius, who was at Opteron server maker Newisys; and co-founder David Borland, who was a chip designer at Marvell, Intel and AMD. The firm is hoping the trio can come up with clever cluster designs that scale across "warehouse-scale data centers," as Calxeda puts it.
When it was set up in January 2008, Calxeda was known as Smooth-Stone – named for the stone used by David to kill Goliath – and its founders are taking a very long view and waiting as patiently as they can as they promote a software ecosystem around ARM-based servers and the integrated switching and management features that their chips offer. While 64-bit processing is something that all ARM server chip makers want and need, the issues they are trying to wrestle with are a lot larger than 32, 40, or 64 bits.
The architecture of the Calxeda system on chip
With the 32-bit EnergyCore ECX-1000 processor announced last November, the goal of the chip and distributed L2 switch was to get a complete system on a chip based on ARMv7 cores into the field and scale that interconnect, then called the EnergyCore Fabric, across a rack and spanning 4,096 nodes. The chip also included an on-chip management coprocessor to optimize and manage power use within each node and across a rack-level cluster.
Freund tells El Reg as the big web properties took a look at what Calxeda and its server partners Hewlett-Packard and Boston had built, they wanted more. "They said to us that over 4,000 nodes in a cluster was interesting, but then they asked us if we could do 100,000 nodes on a cluster," says Freund. "And then they asked us if we could do a million."
So with the coming generations of the EnergyCore chips, not only will Calxeda beef up the ARM cores on its processors, but it will bust that on-chip switching and management out of the rack and across "warehouse-scale" data centers – while rebranding the switching to Fleet Service Fabric Switch and the management engine to Fleet Services Engine. The future chips will also sport additional I/O controllers, potentially with integration to GPU coprocessors and other kinds of accelerators.
Calxeda will be upgrading the current EnergyCore sockets with an upgraded chip based on the Cortex A15 design code-named "Midway," to give customers 40-bit extended memory and support for hardware-based virtualization.
Generally speaking, the Midway chip is expected to deliver about 50 per cent more integer and 2X more floating point performance and support four times the memory (16GB per node and 4GB per thread) as the current ECX-1000 chip while sporting a new 2.0 release of the distributed L2 switch. Nothing comes for free, of course, so this Midway chip will deliver only the same or maybe slightly better performance per watt, according to Freund.
And despite all the chatter about how 32-bit processors are not useful in the modern world, Calxeda continues to believe otherwise and says it has the customers to prove it.
"With media streaming and media content services, you are pretty much just putting bits on a wire, and you don't need 64-bits for that," says Freund. "The EnergyCore ECX-1000 using the Cortex-A9 core is the right product for this. The Cortex-A15 doesn't replace the Cortex A9, which we think has a very long tail."
Freund adds that the first-generation Calxeda chip would make a good storage controller, too, for clustered disk arrays.
Aside from the core swap, the Midway chip will have the first generation of Fleet Services policy-based management and a set of APIs into the Fleet Services engine for fine-grained allocation and resource control that spans an entire rack.
Midway will be available in 2013, perhaps around this time of the year. The company is not disclosing what process it is using to etch the chips or who the fab partner is that is doing the etching, but the current chips are made using 40 nanometer processes and are baked by Taiwan Semiconductor Manufacturing Corp. It stands to reason that Calxeda will stick with TSMC and move on down to the 28 nanometer processes that it has been ramping for the past year and should have in better shape a year from now.
Brawnier clusters on the way
Calxeda's EnergyCore roadmap plots course to brawnier clusters
About a year later comes the "Lago" SoC from Calxeda, and in case you are wondering, Calxeda is picking its code names from a map of Texas and using towns that have 1,000 people or less in their population.
With Lago, Calxeda will move to the ARMv8 core, which has 64-bit processing and memory addressing and next-generation hardware virtualization circuits, among many other features. With this third generation SoC, Calxeda will add more cores, move to the 64-bit Neon floating point unit, offer faster single-thread performance, and add a significantly upgraded L2 switch interconnect that will be able to scale up to 100,000 nodes in a single cluster without having to add any external switches.
The company is not providing any more specs on this interconnect, but says it will have much higher bandwidth (obviously) and much lower latency.
This is a big deal, and it explains, more than anything else, why Intel has been buying up switching assets for the past year and getting paychecks to significant networking talent. Switching, not just network controller ports, is the next thing that is going to be integrated onto processors.
The Lago chip will sport about twice the performance (presumably compared to the Midway chip, and presumably on integer work) and boosted floating point performance (by how much, Calxeda is not saying). Again, no word on what chip process or what fab will be used for Lago, but it could end up being a 28 nanometer chip from TSMC as well.
EnergyCore clusters get bigger, chips get more powerful over time
But that's not all. Calxeda has two more SoCs in the works as it plots its data center domination. With the future "Ratamosa" chip, Calxeda will be able to take on enterprise applications and high performance computing, and Microsoft might even have a Windows Server variant out by then. The company is not saying anything about timing on the Ratamosa chip, but it stands to reason that it will keep to a year cadence and we should see it around the end of 2015.
And even further out, the fifth generation "Navarro" chip is in development, timed for the "enterprise server era," according to Calxeda, which is coded talk for when the Linux and possibly Windows operating systems and various hypervisors are fully mature on ARM processors. Although Calxeda did not confirm this, this would be the logical place to get a distributed L2 switch interconnect into the field that could handle the 1 million nodes that some customers are asking for.
The ARM server race is afoot. ®