ARM server hype ramps faster than ARM server chips
All the more time for Intel to get a leg up
Analysis If I didn't have to man El Reg's systems desk for a paycheck and had a little venture capital to blow, I might start a company called Leg Systems, headquartered on the Isle of Man  – not because of its tax haven status (which is eroding), but because my company would sell ARM-based systems and say that we wouldn't charge an arm and a leg for them.
Let's be honest, that's probably not much less of a business plan than other startups have used to get venture cash.
ARM Holdings, the design and licensing company behind the ARM processor architecture, unmasked  its 64-bit Cortex A50 processor designs in October 2012, and AMD, Samsung Electronics, and Cavium have licensed those designs. AMD and Cavium have admitted that they will be using these ARMv8 architecture chips in servers, and Samsung is widely believed to be working on server parts  as well, but has not confirmed its plans. Marvell has aspirations in the ARM server space, too, and has Dell building experimental boxes using its ARM designs and related networking chips.
The battle pitting ARM chips against X86 processors in the data center – mostly Intel Xeons and now Atoms – is not just about low-energy processing, but also about virtualization, networking, and a more integrated data-center design.
If you are wondering why Intel spent past year acquiring the supercomputer interconnect business from Cray, the InfiniBand business from QLogic, and the Ethernet business from the formerly independent Fulcrum Microsystems, it was to get access to interconnect experts and to figure out when and how interconnects – the next logical piece of the hardware stack – can be integrated onto the processor chip complex.
Don't expect Intel to put a Cray "Aries" XC interconnect on an Atom processor to make a network-ready chips for snap-together clusters, but do expect for them to come up with some kind of on-chip interconnect that can compete against the ARM onslaught and protect Intel's intentions to expand its Data Center and Connected Systems Group's aspirations to rule servers, storage, and networking, and to double its business in these areas to $20bn annually by 2016.
As we discussed at length in November , former Intel chip boss and now VMware CEO Pat Gelsinger thinks that the future is ARM and Intel on the endpoints and Intel in the data center. Specifically, by 2015 the analysis that Gelsinger's staff at EMC put together for the Hot Chips 24 conference shows most of the processor and chipset money either in the data center or on end points.
Mobile devices based on non-x86 architectures in the EMC model are expected to be the largest part of the IT ecosystem, pushing around $34bn in chip and chipset revenues, followed by mobile x86 devices (mostly laptops but some tablets and smartphones) driving maybe $27bn in revenues in CPUs and chipsets. That leaves x86-based servers driving around $18bn in revenues in 2015 and x86-based PC desktops with a mere $5bn in processor and chipset sales.
To Gelsinger's way of thinking, ARM on the endpoint and x86 in the data center becomes the new normal because of the size of the software investment on each side. But there is, as El Reg pointed out, another – and we think equally probable – possibility (with absolutely huge error bars) that companies will decide they want one software stack running on one platform. That could mean Intel wins on the smartphone and tablet endpoints, or it could mean that ARM wins in the cloudy data center and then backs its way into the corporate data center.
How this plays out will depend on many factors, not the least of which being the cleverness of the engineers behind ARM server chips and the software stacks that run atop of them. And there is no shortage of smart alecks at the handful of ARM server chip upstarts. Here's who the players are and what we know of their plans:
Calxeda: This is the first silicon etcher to jump into the ARM server fray back in November 2011  with a custom quad-core Cortex-A9 chip that integrated processing and interconnect onto a single chip.
People have been monkeying around with baby ARM servers and Linux operating systems for a lot longer than this, of course, but the Calxeda EnergyCore ECX-1000 – which includes an on-chip distributed Layer 2 switch interconnect – sets the bar for the level of engineering and integration that will be required to supplant X86 processors and external switches in the data center.
The ECX-1000 chips are based on the ARMv7 spec and only sport 32-bit processing and memory addressing, which is fine for certain kinds of media processing, simple web serving, and even some big-data munching jobs that are more constrained by I/O than memory or CPU.
That said, companies have been writing 64-bit software for a long time and they don't want to go back, and 4GB of main memory for four cores is a bit skinny, even if the chip architecture does have a very sophisticated interconnect that can span 4,096 server nodes in a single cluster and without using external switches.
This year, Calxeda will move to a Cortex-A15 core  with a chip code-named "Midway" that sports 40-bit memory addressing, boosting the memory on a four-core chip to 16GB. This chip will also provide twice the performance, enhanced virtualization, and a more scalable implementation of that integrated fabric, which is now called the Fleet Service Fabric Switch.
Sometime in 2014 – about a year after Midway ships – Calxeda will move to the ARMv8 core from ARM Holdings with its "Lago" system-on-chip, providing 64-bit processing and memory addressing. Lago will again double the performance of the processor (probably through more cores and not through clock-speed bumps) and add floating point processing in hardware as well as a third-generation on-chip interconnect fabric that will span more than 100,000 nodes.
Calxeda is at the moment only licensing the Cortex-A57 as the basis of its Lago chips, but it is possible that in the future it could employ the Cortex-A53 processors for certain workloads or employ the two different types of chips on the same die in the big.LITTLE approach championed by ARM Holdings.
Further out beyond that is an ARM SoC from Calxeda called "Ratamosa" that will also have performance enhancements, and will be aimed at full-on enterprise applications and supercomputing workloads. While no one will admit to this, Ratamosa is probably timed to coincide with the availability of a commercial-grade and field-tested Windows Server 2012 R2 update, which is the first possible version of Windows that Microsoft might field supporting both x86 and ARM processors. Microsoft could, of course, provide an ARM port of the baseline Windows Server 2012 and its key systems software such as SQL Server and Exchange Server any time it chooses. But for the moment, Redmond seems content to let Red Hat and Canonical lead in ARM support for their Linux distributions while they see what develops.
Applied Micro Circuits: This company is backing into the server chip business from the networking chip and embedded processor markets where it has been making its living in the hopes of carving out a big, juicy, profitable slice of the server racket.
The company launched its X-Gene  multi-core SoC based on the ARMv8 design  in October 2011, a year before ARM Holdings put out the full ARMv8 specs as embodied in the Cortex-A53 and Cortex-A57 reference designs.
Applied Micro wants to be first with 64-bit ARM servers and to build a sustained lead over its future rivals. The companyprovided more details  on the initial X-Gene chipslast summer at Hot Chips, and was showing off potential compute and storage server designs  based on the X-Gene chip when everyone else was making ARMv8 announcements at last October's ARM TechCon 2012 event.
Applied Micro has not released the full specs of the X-Gene chip, but what we know is that it uses a two-core module as the basic building block of the SoC. The cores have a four-wide, out-of-order execution unit for integer work, include full virtualization support including nested page tables that hypervisors expect, and have their own L1 data and L1 instruction caches.
The core pair shares an L2 cache, and multiple pairs are ganged up to make a multicore system. A coherent network on the SoC delivers 160GB/sec of bandwidth and links core pairs to each other and to on-chip PCI-Express, networking, and SATA ports as well as to DDR3 main memory.
The initial X-Gene chip will be implemented in the 40 nanometer process from Taiwan Semiconductor Manufacturing Corp (which also etches Calxeda's ARM chips), and will top out at four core modules and eight cores running at a maximum of 2.5GHz.
Each eight-core chip will address up to 256GB of physical memory, 40GB/sec of networking I/O, and 17 lanes of PCI-Express 3.0 bandwidth to carve up into slots. That on-chip interconnect fabric can be extended to a total of 16 processor sockets for a total of 128 cores in a single cluster image.
This initial X-Gene chip is supposed to sample in the first quarter with volume shipments at the end of 2013.
The next-gen X-Gene
The next generation X-Gene will be shrunk using TSMC's 28nm process, and will have a total of 16 cores running at 3GHz. The coherent network on the chip will extend out to 64 processor sockets in a glueless fashion to a maximum of 1,024 cores in a cluster image. While that is nowhere near the scale that Calxeda is talking about with clusters based on current and future EnergyCore SoCs, it is still a large number and something that cloud providers will take a hard look at if the chip and its interconnect work as advertised.
That 1,024 core count is certainly a lot higher than Intel is promising as a single cluster image for either Xeon or Atom processors with integrated switching, since Intel has not promised any integrated switching – yet. All that it has said is that the future "Avoton" Atom S Series chips due later this year will include on-chip Ethernet controllers. Intel was hoping to build an external switch business that rivals its server and storage biz, and here are all of these ARM vendors talking about integrated switching.
Marvell: It has been more than two years since this chip maker launched  its Armada XP ARMv7 derivatives aimed at servers, and the company has gotten some traction with its silicon.
Dell is not playing favorites in the early adopter phase of the ARM server market, and is shipping hyperscale servers based on Calxeda ECX-1000 processors called Zinc , which slide into its C8000 chassis, as well as the Copper server nodes  based on Marvell's ARM processor and networking chips, which slide into its C5000 chassis.
The Marvell-based Dell machines use the 40-bit, quad-core Armada XP 78460 processor, based on its "Sheeva" family of SoCs. The Sheeva PJ4B cores run at a modest 1.6GHz, and each SoC can address up to 8GB of DDR3 main memory (which has ECC memory scrubbing).
There are four PCI-Express 2.0 controllers on the Armada chip, plus controllers to drive two SATA peripheral ports and four Gigabit Ethernet ports. The chip has a 4Gb/sec packet processor, perhaps useful for encrypting and decrypting data, and a controller to link to three USB 3.0 ports – all packed into a 15 watt thermal envelope. Dell chose the companion "Cheetah" Layer 2 network chip to link multiple Sheeva SoCs together.
This is the same basic Marvell chip technology that upstart server maker Codethink is using in its Baserock Slab  ARM server, by the way.
Marvell got into the ARM chip racket after it bought Intel's Xscale ARM chip biz back in 2006, when Intel's top brass decided to focus solely on the x86 architecture for its processors. Notably, when the big Cortex-A50 announcement was made at ARM TechCon 2012, Marvell was not one of the licensees. But you can license the ARM specs and create a custom chip rather than license the ARM Holdings reference design, as do Apple and Nvidia, just to name two companies, so Marvell could be working on a next-gen Armada XP aimed at servers.
Nvidia: Everybody was sure during the 2010 rumor mill that graphics chip maker Nvidia was going to get into the x86 processor racket, taking on Intel in PCs and servers. They were wrong.
What Nvidia did announce  in January 2011 was an effort called Project Denver, which will create custom 64-bit ARM processors that the company will embed on future "Maxwell" GPU chips, due in 2013 and offering around five times the gigaflops per watt as the current "Kepler" series of GPUs.
Nvidia's Tegra line of smartphone and tablet processors is based on ARM architecture, so the company has plenty of experience designing its own ARM variants. To a certain extent, Project Denver is not something new so much as an effort to unite two parts of Nvidia onto a single die. Nvidia has not revealed much about the Denver effort, but CEO Jen-Hsun Huang said during a conference call going over Nvidia's Q3 numbers in early November that Denver "is going great" and that it was an ARMv8 processor with "some exciting secret sauce."
AMD: If AMD has a code name for its future ARM Cortex-A57 Opteron processors, El Reg doesn't know about it – but perhaps "Baking Soda " is a good name, seeing as how AMD hopes to take the best of its Opteron processors, which were originally known as the "Hammer" family, and match it up with ARM cores to make an x86 alternative for servers.
AMD has not yet provided a roadmap for its ARM-based server chips, but said  last October 2012 when the Cortex-A50s were launched by ARM Holdings that the processors would carry the Opteron brand and that the "Freedom" interconnect fabric that is at the heart of its SeaMicro microservers would be embedded in some fashion on the chips, allowing "hundreds of thousands" of CPUs to be linked into a cluster over that fabric.
These ARM-based Opterons are expected in 2014, and while AMD didn't talk about it, it's also possible for AMD to roll up ARMv8 chips to serve the tablet and PC markets, as well.
Cavium: This supplier of network processors based on the MIPS architecture is expanding out to ARM chips through an effort called Project Thunder, launched last August. The details are quite sketchy about Project Thunder, but Cavium is a licensee of the ARMv8 architecture and says that it is building a custom chip rather than licensing either of the Cortex-A50 chips as the basis of its designs.
Cavium has processors based on MIPS architectures that span from two to 48 cores, and it will be interesting to see if the company will push up the core-count envelope, push down the clock speeds, and go after parallel and media workloads much as it has with its Octeon MIPS chips. The company says it is aiming at compute and storage workloads in the data center with the Thunder chips, and will give out more details about its plans in the future.
The word on the street  in the wake of Cavium's announcement last summer was to expect Thunder chips in the second half of 2014. That seems like a million miles away, but it is when many other 64-bit ARM server chips are also expected.
Samsung: In many ways, Samsung is the wild card in the ARM server-chip racket. The company has big advantages over other ARM server chip providers in that it also makes and sells its own main and flash memory as well as disk drives. The Korean giant could do some very clever integration if it licensed or developed 3D packaging technologies.
Notably, Samsung was a licensee of the Cortex-A50 series of chips, but the company has not responded to chatter about its rumored plans to enter the server fray. All the company said back in October was that it was working on "future computing platforms." No kidding.
With so many companies working on processors and interconnects, the open source Linux operating system is a natural fit for the ARM chips, since chip developers and software providers can help get Linux working on all of these different machines.
But the variety may be as much a hindrance to the adoption of ARM chips in the data center as it is a help. The chip makers are going to have to get enthusiastic and consistent support from the Linux kernel coders and distro makers to support many of these chips before they become economic successes, and that is going to take a certain amount of money as well as time.
So the ARM server chips with the most enthusiastic support in Linux may ramp up even if it is not necessarily the best technical option. For instance, an interconnect could be problematic to support with the Linux kernel and thereby limit its appeal – though we are not saying that this has or will happen.
Limiting ARM servers to Linux only is also an issue over the long haul, but is probably not a big deal in the short run. Those hyperscale cloud providers (with the exception of Microsoft) that write their own code for the most part to provide their search, social media, and other services are already running on Linux. And if ARM takes off at Microsoft's competitors and provides a substantial cost or integration advantage, then that is when we can expect Redmond to roll out a Windows Server 2013 edition for ARM servers.
Microsoft will no doubt want to be its own first customer for such a software release. ®