ARM server hype ramps faster than ARM server chips
All the more time for Intel to get a leg up
SaaS data loss: The problem you didn’t know you had
Analysis If I didn't have to man El Reg's systems desk for a paycheck and had a little venture capital to blow, I might start a company called Leg Systems, headquartered on the Isle of Man – not because of its tax haven status (which is eroding), but because my company would sell ARM-based systems and say that we wouldn't charge an arm and a leg for them.
Let's be honest, that's probably not much less of a business plan than other startups have used to get venture cash.
ARM Holdings, the design and licensing company behind the ARM processor architecture, unmasked its 64-bit Cortex A50 processor designs in October 2012, and AMD, Samsung Electronics, and Cavium have licensed those designs. AMD and Cavium have admitted that they will be using these ARMv8 architecture chips in servers, and Samsung is widely believed to be working on server parts as well, but has not confirmed its plans. Marvell has aspirations in the ARM server space, too, and has Dell building experimental boxes using its ARM designs and related networking chips.
The battle pitting ARM chips against X86 processors in the data center – mostly Intel Xeons and now Atoms – is not just about low-energy processing, but also about virtualization, networking, and a more integrated data-center design.
If you are wondering why Intel spent past year acquiring the supercomputer interconnect business from Cray, the InfiniBand business from QLogic, and the Ethernet business from the formerly independent Fulcrum Microsystems, it was to get access to interconnect experts and to figure out when and how interconnects – the next logical piece of the hardware stack – can be integrated onto the processor chip complex.
Don't expect Intel to put a Cray "Aries" XC interconnect on an Atom processor to make a network-ready chips for snap-together clusters, but do expect for them to come up with some kind of on-chip interconnect that can compete against the ARM onslaught and protect Intel's intentions to expand its Data Center and Connected Systems Group's aspirations to rule servers, storage, and networking, and to double its business in these areas to $20bn annually by 2016.
As we discussed at length in November, former Intel chip boss and now VMware CEO Pat Gelsinger thinks that the future is ARM and Intel on the endpoints and Intel in the data center. Specifically, by 2015 the analysis that Gelsinger's staff at EMC put together for the Hot Chips 24 conference shows most of the processor and chipset money either in the data center or on end points.
Mobile devices based on non-x86 architectures in the EMC model are expected to be the largest part of the IT ecosystem, pushing around $34bn in chip and chipset revenues, followed by mobile x86 devices (mostly laptops but some tablets and smartphones) driving maybe $27bn in revenues in CPUs and chipsets. That leaves x86-based servers driving around $18bn in revenues in 2015 and x86-based PC desktops with a mere $5bn in processor and chipset sales.
To Gelsinger's way of thinking, ARM on the endpoint and x86 in the data center becomes the new normal because of the size of the software investment on each side. But there is, as El Reg pointed out, another – and we think equally probable – possibility (with absolutely huge error bars) that companies will decide they want one software stack running on one platform. That could mean Intel wins on the smartphone and tablet endpoints, or it could mean that ARM wins in the cloudy data center and then backs its way into the corporate data center.
How this plays out will depend on many factors, not the least of which being the cleverness of the engineers behind ARM server chips and the software stacks that run atop of them. And there is no shortage of smart alecks at the handful of ARM server chip upstarts. Here's who the players are and what we know of their plans:
Calxeda: This is the first silicon etcher to jump into the ARM server fray back in November 2011 with a custom quad-core Cortex-A9 chip that integrated processing and interconnect onto a single chip.
People have been monkeying around with baby ARM servers and Linux operating systems for a lot longer than this, of course, but the Calxeda EnergyCore ECX-1000 – which includes an on-chip distributed Layer 2 switch interconnect – sets the bar for the level of engineering and integration that will be required to supplant X86 processors and external switches in the data center.
The ECX-1000 chips are based on the ARMv7 spec and only sport 32-bit processing and memory addressing, which is fine for certain kinds of media processing, simple web serving, and even some big-data munching jobs that are more constrained by I/O than memory or CPU.
That said, companies have been writing 64-bit software for a long time and they don't want to go back, and 4GB of main memory for four cores is a bit skinny, even if the chip architecture does have a very sophisticated interconnect that can span 4,096 server nodes in a single cluster and without using external switches.
This year, Calxeda will move to a Cortex-A15 core with a chip code-named "Midway" that sports 40-bit memory addressing, boosting the memory on a four-core chip to 16GB. This chip will also provide twice the performance, enhanced virtualization, and a more scalable implementation of that integrated fabric, which is now called the Fleet Service Fabric Switch.
Sometime in 2014 – about a year after Midway ships – Calxeda will move to the ARMv8 core from ARM Holdings with its "Lago" system-on-chip, providing 64-bit processing and memory addressing. Lago will again double the performance of the processor (probably through more cores and not through clock-speed bumps) and add floating point processing in hardware as well as a third-generation on-chip interconnect fabric that will span more than 100,000 nodes.
Calxeda is at the moment only licensing the Cortex-A57 as the basis of its Lago chips, but it is possible that in the future it could employ the Cortex-A53 processors for certain workloads or employ the two different types of chips on the same die in the big.LITTLE approach championed by ARM Holdings.
Further out beyond that is an ARM SoC from Calxeda called "Ratamosa" that will also have performance enhancements, and will be aimed at full-on enterprise applications and supercomputing workloads. While no one will admit to this, Ratamosa is probably timed to coincide with the availability of a commercial-grade and field-tested Windows Server 2012 R2 update, which is the first possible version of Windows that Microsoft might field supporting both x86 and ARM processors. Microsoft could, of course, provide an ARM port of the baseline Windows Server 2012 and its key systems software such as SQL Server and Exchange Server any time it chooses. But for the moment, Redmond seems content to let Red Hat and Canonical lead in ARM support for their Linux distributions while they see what develops.
Applied Micro Circuits: This company is backing into the server chip business from the networking chip and embedded processor markets where it has been making its living in the hopes of carving out a big, juicy, profitable slice of the server racket.
The company launched its X-Gene multi-core SoC based on the ARMv8 design in October 2011, a year before ARM Holdings put out the full ARMv8 specs as embodied in the Cortex-A53 and Cortex-A57 reference designs.
Applied Micro wants to be first with 64-bit ARM servers and to build a sustained lead over its future rivals. The companyprovided more details on the initial X-Gene chipslast summer at Hot Chips, and was showing off potential compute and storage server designs based on the X-Gene chip when everyone else was making ARMv8 announcements at last October's ARM TechCon 2012 event.
Applied Micro has not released the full specs of the X-Gene chip, but what we know is that it uses a two-core module as the basic building block of the SoC. The cores have a four-wide, out-of-order execution unit for integer work, include full virtualization support including nested page tables that hypervisors expect, and have their own L1 data and L1 instruction caches.
The core pair shares an L2 cache, and multiple pairs are ganged up to make a multicore system. A coherent network on the SoC delivers 160GB/sec of bandwidth and links core pairs to each other and to on-chip PCI-Express, networking, and SATA ports as well as to DDR3 main memory.
The initial X-Gene chip will be implemented in the 40 nanometer process from Taiwan Semiconductor Manufacturing Corp (which also etches Calxeda's ARM chips), and will top out at four core modules and eight cores running at a maximum of 2.5GHz.
Each eight-core chip will address up to 256GB of physical memory, 40GB/sec of networking I/O, and 17 lanes of PCI-Express 3.0 bandwidth to carve up into slots. That on-chip interconnect fabric can be extended to a total of 16 processor sockets for a total of 128 cores in a single cluster image.
This initial X-Gene chip is supposed to sample in the first quarter with volume shipments at the end of 2013.
Next page: The next-gen X-Gene
COMMENTS
Re: Surely this level of competition is good?
@AC,
"Surely this level of competition is good?"
It certainly is!
Intel are in the process of being caught with its trousers round its ankles. The process is quite slow, and because Intel is used to innovating at a pace sufficient to outstrip AMD (and the others you mentioned) they've not seen ARM coming along.
My own view is that several things are happening all at once:
1) The absolute requirement for x86 binary compatibility has evaporated in the space of a single year (about four years ago).
The mobile revolution has shown that the software world is more than happy to abandon x86 binary compatibility. More importantly the Linux world simply laughs at the mere idea of it; most stuff compiles / runs just fine no matter what the underlying CPU architecture is (all other factors being equal).
2) The world has decided that power consumption really does matter. Running a big data centre takes a lot of juice, and that costs a lot of money now.
The mobile world has shown that ARM is the way to go to get good power consumption. Intel's problem is that it takes a lot more transistors (pipelines, instruction re-coding, enormous caches, speculative execution, etc) to make the X86 instruction set run well than are needed for ARM. Transistors consume power. It means that X86 has a built in disadvantage, with Intel staying 'ahead' only because they're really good at silicon manufacturing.
3) Everyone's realised that Intel's "Do everything in the CPU" is wrong.
ARM have comprehensively shown that Intel's philosophy is wrong. Most of the compute intensive things that your mobile does (e.g. video decoding) runs on dedicated co-processor cores or the GPU, not on the ARM core itself. That saves a bunch of power. The same will happen in ARM server chips I expect.
So, what can Intel do?
There's not really a lot Intel can do about this now. The time to have acted was about 7 years ago. To really compete they'll have to go to an entirely different architectural design. That didn't work so well last time they tried. Itanium - remember that? Ironically, that failed because of a perceived need for binary compatibility. It might have succeeded if they had done a really cheap licensing deal with AMD long before launch.....
They could risk building a whole new architecture, but it could be difficult to persuade the software world to follow.
They could re-start building ARM devices themselves but that means following ARM, not leading them. However that's their best bet I think. Intel's silicon mastery could make Intel ARM chips much better than anyone else's, which would mean good profit.
Also they're not helped by the fact that 'ARM' doesn't really mean one single thing anyway. Watching Intel trying to make X86 compete with ARM is a bit like watching an elephant trying to swat a swarm of mosquitoes. Nowadays the mozzies' teeth are getting bigger...
Re: Surely this level of competition is good?
To go even further down the memory lane, Intel's other attempt at a new ISA was the i860. Readers of the grey-haired persuasion will remember articles in Byte [yes, the paper edition] and this chip appearing on boards, possibly alongside an i486 or used on the graphics add-on for the NeXT cube.
Re: Anything but certain
@ Sil,
"These advantages will mostly turn to disadvantages in the server space:
"End-users, in this case corporation, will care about the processor, its performance, compatibility and reliability more than performance per watt. And it remains to be seen that ARM has a performance per watt advantage over Intel offerings for server applications;"
I'm afraid I disagree - performance per watt is very important to the large players (Facebook, Twitter, Google, etc). There are many challenges ahead in getting ARM into the server space in a big way. However the really, really big driver is power consumption.
For the really big data centres it is the dominating cost; more than manpower, more than equipment costs, more than anything else. If they can make big inroads on power consumption by spending a bit more on hardware, design, staff, software, security, maintenance, etc. then they will do it. Their shareholders' demands will make them do it. And if someone makes that fairly easy for them (e.g. Calxeda) then that provider would be in a position to push their wares to the smaller data centre operators too, all of whom would like to save costs too.
Most of what Intel has done regarding power consumption has been based on improved transistor manufacturing, not fundamental architectural innovation. The X86 architecture with it's instruction translation, pipelines, caches, branch prediction, speculative execution, etc. is at a disadvantage compared to ARM which need far less of that sort of thing. And less, so the saying goes, is more.
I think Intel's best bet would be to embrace ARM/server wholesale, and keep doing X86 too.
- It wouldn't actually cost that much, ARM are pretty reasonable on license fees apparently.
- It covers all the bases, which is a good way of not missing out
- It sends a clear message to its customers: look no further than Intel, they've got all your requirements covered in their catalogue.
- They're big enough that they could 'define the platform' in a way that the morass of ARM manufacturers has, as you rightly hint, failed to do.
- It's the old devide-and-conquer strategy, tried and trusted down the millenia
- They might finally get a look in on the mobile market

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring