Related topics

Intel's Avoton Atoms give microservers muscle – and Xeon-class features

Now we see if there is a real business here

ARM versus Atom

With the launch of the "Avoton" Atom C2000 server chips, Intel is putting its second-generation of 64-bit, server-class Atom processors into the field - and what is arguably the first such Atom that is truly designed for modern server workloads.

The C2000 has enough computing oomph, enough memory capacity, and integrated memory, peripheral, and network controllers all on the die. It is truly a system-on-a-chip, akin to the Atom and ARM SoCs that are commonplace in phones, tablets and other handheld gear.

While the technology packed into the Avoton chips, and the "Rangeley" variants that have been tweaked specifically for use in network devices, is as crunchy as usual, perhaps the most important thing about the C2000s is something you can't see from the specs.

And that thing, says Ronak Singhal, senior principal engineer at Intel's Data Center and Connected Systems Group, is the cooperation between the Atom and Xeon processor development teams. Many key engineers work on both these days, and the chips will be etched in the same processes at more or less the same time going forward, too.

"We are taking the learnings from one project and applying it to the other," he said. "We have been working on Xeons for a long time, and everything that we learn about power and performance we can apply to what we are doing for our Atom SoC for servers.

"The things that we learn in phones and tablets in our Atom SoCs, those can migrate up to Xeon. We are learning to leverage those technologies very nicely, and we are doing a cross-pollination of people as well."

Don't get the wrong idea. Intel doesn't like to have a complicated product line. No IT vendor does. But because server, storage, and networking customers are coming to Chipzilla with increasingly divergent needs and they want to have processors, networking, and other aspects of the system tuned precisely for workloads – and within a particular budget, too – Intel has little choice but to be more flexible than it has in the past.

Die shot of the Avoton C2000 processor

Die shot of the Avoton C2000 processor

"What somebody wants on the HPC side is very different from what a cloud service provider wants on the other side, and it tends to be pretty different from what a storage or a communications customer wants," explains Singhal.

"Obviously we want to satisfy the needs of each of these customers, and we are going to create more and more targeted solutions. The challenge for us is how do we create parts that serve the needs for all of these customers and do it in such a way that there is consistency across the features and do it in such a way that it is something that we can actually build. We can't create custom solutions for everybody in the world. We just don't have the scale to do that."

What Intel can do, however, is make a respectable Atom processor with many Xeon-ish features and add that to the mix of Xeon E3, E5, and E7 server chips as well as to the parallel Xeon Phi coprocessor and in doing so, have a much broader portfolio of CPUs than it had a decade ago when it had Xeon and Itanium chips that were not binary compatible. (Yes, El Reg knows about the x86 emulation environment in the early Itaniums.)

The Avoton chip is implemented in Intel's current 22-nanometre TriGate wafer baking processes, just like the impending "Ivy Bridge-EP" Xeon E5 v2 chips will be. The Avoton chip package (not the die) is 34 millimetres by 28 millimetres in size. The prior "Centerton" Atom S1200 chips were etched in 32-nanometre processes. That shrink helps Intel cram a lot more on the core, and also allows Intel to create more power-efficient designs.

Avoton is aimed at microservers, which are defined roughly as single-socket boxes with modest memory slot and peripheral expansion as well as a small physical footprint to jack up rack density and, in theory, a low cost per unit of performance than fatter and more standard two-socket x86 machines. In other words, it was aimed at the same customers who had been mulling over the prior "Centerton" Atom S1200s as well as the impending 64-bit ARM server chips from Calxeda, Advanced Micro Devices, Applied Micro, Marvell, and a few others who may jump into the game (possibly even Samsung).

Rangeley is a tweak of Avoton that turns on the QuickAssist Technology (QAT) accelerator on the chip, which hooks into Intel's Data Plane Development Kit for network gear makers to juice AES, DES/3DES, Kasumi, RC4, and Snow3G ciphers, MD5, SHA1, SHA2, and AES-XCBC authentication, and Diffie-Hellman, RSA, DSA, and ECC public key encryption. This QAT coprocessor can process ciphers at 10Gb/sec. And by the way, not all of the Rangeley chips will have this QAT accelerator activated; to be precise, only four out of the eight SKUs will. The reason for this is that the QAT accelerator is a controlled substance and the US government has export controls on it.

The Rangeley chips will also be available for purchase from Intel for a much longer period of time than the Avotons, which is a requirement of network and telecom equipment makers, and also have enhanced thermal and reliability specs that these customers need before they put a processor into their gear. Network gear out there in the field is in a much harsher environment than the typical data center, although some data centers are running hotter to save on the electricity bill and it would not be surprising to see ruggedized servers using Rangeley instead of Avoton for military and other customers. Provided Intel doesn't charge too much of a premium for Rangeley.

Block diagram of the Atom C2000 processor

Block diagram of the Atom C2000 processor (click to enlarge)

As Intel has previously disclosed, the Avoton and Rangeley Atom C2000 processors are based on the "Silvermont" architecture, which brings out-of-order execution to the Atom core for the first time and also does away with the old frontside bus architecture that Intel killed off in the Xeon chips back with the "Nehalem" Xeon 5500s in 2009. The Silvermont architecture also excels compared to the Saltwell design used in the prior Centerton Atoms in that the instruction pipeline has lower latencies and higher throughput and sports more efficient and accurate branch predictors and a faster recovery pipeline. The L1 and L2 caches on the Avoton chip also have lower latencies and higher bandwidth.

The Avoton core takes the 64-bit instruction set from the Core 2 processors and weaves in the SSE4.1, SSE4.2, POPCNT, PREFETCHW, AES-NI, and a few other instructions from the "Westmere-EP" Xeon 5600 chips. Avoton has 32KB of L1 data cache and 24KB of L1 instruction cache. The cores are cookie-cuttered onto the die in pairs that have a shared 1MB L2 cache. The C2000 has support for VT-x2 virtualization, but does not support Intel's HyperThreading implementation of simultaneous multithreading to present each core as two virtual cores to the operating system. That VT-x2 support allows for extended page tables, virtual processor IDs, and unrestricted guests and an instruction called VMFUNC allows code running in a guest partition to invoke hypervisor functions.

The chip has two DDR3 memory controllers, which can drive regular DDR3 memory running at 1.5 volts or lower-powered memory running at 1.3 volts; memory sticks running at 1.6GHz are supported.

Each controller has two DIMM slots, for a maximum of four slots with a total of 64GB of main memory using 8Gb memory chips. The memory controller actually has 38-bit physical addressing and 48-bit virtual addressing, in case you are wondering. (Just because a chip has 64-bit processing doesn't mean it has full 64-bit memory addressing.) The memory controllers have enhanced ECC memory scrubbing and other goodies to give it server-class memory reliability. These include a DDR scrambler, error injection with address/source match, and a hardware-based demand and patrol engine. The chip has 25.6GB/sec of peak memory bandwidth out of the main memory and into the L2 caches.

The Avoton chip also has four PCI-Express 2.0 – not 3.0 – controllers with a total of sixteen lanes of capacity. For the kinds of workloads that Intel is chasing, 80 lanes running at PCI-Express 3.0 speeds, as a two-socket Xeon E5 offers today, is a bit much. Part of the reason why those on-chip PCI-Express controllers do not have to work so hard is that the Avoton chip has two SATA 3.0 ports and four SATA 2.0 ports to link to physical storage and an integrated Ethernet controller that can be configured as four lanes running at either 1Gb/sec or 2.5Gb/sec. (El Reg was not aware of this, but switch ASICs from Broadcom, Intel, Marvell, Hewlett-Packard, and Cisco Systems support this 2.5Gb/sec mode, according to Brad Burres, one of the designers of the Avoton chip.) This on-die Ethernet controller is based on Intel's "Powerville" i350 discrete Ethernet controller chip, which has been goosed with that 2.5Gb/sec support as it was etched onto the Avoton die.

The Avoton also has a controller to drive four USB 3.0 ports and another controller for various legacy I/O devices.

The Edisonville platform based on the Avoton Atom

The Edisonville platform based on the Avoton Atom

In a way, an Avoton SoC is like a baby Nehalem-EP server implemented on a single die. It has eight cores and 64GB of memory, and probably has roughly the same performance as a mid-line Nehalem system had nearly five years ago. (We'll find out more when Intel releases some benchmarks.)

Interestingly, the chip includes a Nehalem-style crossbar interconnect called the Silvermont System Agent, or SSA, that provides a point-to-point interface to those two-core CPU modules and their shared L2 caches. This system agent maintains the cache coherency across the core groups and also links to the Intel On-chip System Fabric. The IOSF was created for all of Intel's SoCs for both client and server devices. It has a high-speed fabric for those PCI-Express 2.0 controllers and then a secondary medium-speed fabric that the other controllers on the die slot into.

This lets those PCI-Express slots run unobstructed by other peripherals. The IOSF supports PCI-Express headers and ordering rules so existing operating systems and other storage software can make use of this bus without modifications. This IOSF bus runs at 400MHz, and you can gear it down to save power if you don't need it to run that fast. Intel's "Haswell" Core and Xeon designs have IOSF buses, and so do its recent generations of Xeon server chipsets.

The Avoton Atom C2000 chips, prices not included

The Avoton Atom C2000 chips, prices not included

There are five Avoton processors aimed at servers and eight Rangeley processors aimed at networking gear that are all based on the same core. They have two, four, or eight cores activated and depending on the model run at 1.7GHz, 2.0GHz, or 2.4GHz with Turbo Boost adding either 200MHz or 300MHz. (One Rangeley chip, the one running at 2.0GHz, does not have Turbo Boost.) The four-port SATA 2.0 controller is not enabled on all models of the Avoton, and the low-end C2350 chip has only two cores, only one memory channel, and no SATA 2.0 ports, and that is how Intel is able to get its thermal design point down to 6 watts.

Here's how the Avoton and Rangeley chips map out against the Atom S1200 and the Xeon E3 v3 processors:

The Avoton and Rangeley chips versus Atom S1200 and Xeon E3 alternatives

The Avoton and Rangeley chips versus Atom S1200 and Xeon E3 alternatives (click to enlarge)

It will be very interesting to see how microservers based on the Avoton chips compare to those based on the Xeon E3 processors in terms of both performance and bang for the buck. As of press time, Intel had not yet finalized pricing for the Avoton and Rangeley chips, so we can't do any comparisons yet.

What we can tell you is that when you add it all up, an Avoton thread can do about twice as much work as a Centerton thread or can do the same work for about one-fifth the power consumption. When you scale the cores up from two with Centerton to eight with Avoton, you can push a lot more work through the Avoton. Depending on the workload, a top-bin Avoton can do somewhere between five and ten times the work of a Centerton. ®

Sponsored: 5 critical considerations for enterprise cloud backup