AMD takes on Atom S server chips with 'Kyoto' Opteron Xs
Corralling multimedia and other flops-hungry workloads onto microservers
Given its full name, Advanced Micro Devices should be dominating in microservers, those densely packed, wimpy-cored machines that are good for all kinds of data center jobs. And with the launch of its "Kyoto" Opteron X processors, AMD is hoping to get the jump on Intel and its Atom S1200 Series chips, also aimed at microservers.
The BGA package of
the Opteron X chip
The Opteron X chips are based on the same "Jaguar" core used in the trio of low-power Fusion APU chips aimed at client machines that were announced last week. And the connection between microservers and end-user devices is more than a server chip using a client device as a foundation.
"What is so great about the industry is that we got an ultrathin where the CPU costs $60 and the OS costs $60, and we got a smartphone or tablet where the CPU is $9 and the OS is free," Andrew Feldman, general manager of the server business unit at AMD, explained to El Reg.
"The interesting thing is that a $9 CPU can't do anything," he said. "Left alone, a $9 CPU can play Angry Birds, or display the world's knowledge if it is computed back on the cloud. So this stuff rolls right back into the data center. This transformation on the client side has put tremendous change and pressure on the data center."
In some cases, customers need hefty X86 engines to support databases or virtualized server instances or in-memory processing. But for application caching, web serving, dedicated hosting, and some other smaller applications, a single-socket microserver rather than a standard two-socket or four-socket server is the better option. And that is why AMD has cooked up the Opteron X chip
The new microserver chip comes in two flavors: one with the Radeon graphics chip turned on and able to do some mathematical work for the server, and one that has the GPU on the chip turned off, and with both a lower price tag and lower energy consumption.
Four cats wanting to chew on Windows and Linux
"This is, bar none, the premiere small-core part in the industry," brags Feldman.
And based on the feeds and speeds comparison between the quad-core Kyoto chip and Intel's dual-core "Centerton" Atom S1200, launched in December last year, this statement holds up.
Stacking up the Opteron X-Series against the Intel Atom S1200
It is unclear how Kyoto will stack up against the eight-core "Avoton" Atom S Series server parts, which are expected in the second half of this year. The rumors running around only two months ago were that Avoton will have one, two, or four cores, but in the disclosure of the "Silvermont" architecture, Intel said that it can push this generation of Atom chips up to eight cores thanks to the employment of its 22-nanometer wafer-baking processes.
Whether it will do that on server parts is not known. But frankly, for some workloads, an eight-core Atom might be better than a four-core "Haswell" Xeon E3. Or, perhaps unfortunately for AMD, a four-core Opteron X. (We shall see in a few days with the Xeon E3s and a few months with the Atom S1200 v2s.) AMD is using Taiwan Semiconductor Manufacturing Corp to fab its Kyoto processors, and they are etched with 28nm tech.
But, still, this is AMD's time in the microserver sun.
"Everything in the data center is changing," Feldman said, "and in that environment, doing the same thing you have always done is unlikely to produce a different outcome. We are trying to do some different things – some of them will work, some of them will not work."
One of those different approaches, he said, is focusing less on higher-end x86 server chip parts and more on lower-end – and much lower-powered and much lower-priced – parts suitable for microservers. Where AMD is going to focus on is performance per dollar, performance per watt, and adding functions through the APUs and their integrated graphics processors, which can be used to offload algorithms from facial recognition to seismic processing.
It also means working with the Open Compute Project to open-source server designs based on AMD motherboards. And of course, it means peddling servers from AMD-acquired SeaMicro directly to data centers, often in competition with the server makers that buy its Opteron parts and, ironically, often using Intel Xeon E3 partners, not Opterons.
Block diagram of the Opteron X-Series chip
The Opteron X processor has four 64-bit "Jaguar" cores. The chip has 2MB of L2 cache that is shared across those four cores and an on-chip DDR3 memory controller that can address up to 32GB of main memory. (You can't do more than that without adding more memory lanes and fattening up the memory controller, which adds cost and heat. El Reg asked, specifically for workloads that might be memory bound more than CPU bound.)
The chip supports up to two memory sticks running at 1.6GHz in either SODIMM or UDIMM variants, and has error correction on both L2 cache and main memory, which is necessary for server workloads. There's a PCI-Express 2.0 controller on the die as well, which has eight lanes in total. There is also a controller for eight USB 2.0 ports and two USB 3.0 ports, plus two SATA ports that can support either the 2.0 or 3.0 level.
The Ethernet controller has not been brought down onto the die, as will be done with the future Intel Avoton parts, but Feldman says that eventually AMD will weld Ethernet NICs to the Opteron Xs. All of this crunchy goodness is wrapped in an FT3 ball-grid array package that measures 24.5mm on a side and that you plunk down into the system board.
The Opteron X1150 has all four cores activated and running at 2GHz. None of the cores are shut off, nor does AMD do bin sorting to create different SKUs, but you can obviously power-gate cores that are not being used.
The X1150 does not have the Radeon HD 8000 GPU on the die turned on, however. Depending on how different parts of the chip are exercised, the Opteron X1150 has a thermal design point of between 9 and 17 watts. In 1,000-unit trays, you can have the Opteron X1150 for $64.
The Opteron X2150 has the same feeds and speeds, but the cores run at 1.9GHz and the GPU chip is turned on to allow for offloading of work from the Jaguar cores when that work is better suited to the GPUs. Video compression offload and video encoding and decoding are two uses where Feldman says the Opteron X will see work in the data center.
The GPU hooks into the CPU port of the chip over the unified northbridge, or UNB in AMDspeak. It costs $99 in 1,000-unit quantities and has a range of 11 to 22 watts for its thermal design point, again depending on how hard you stress the GPU and CPU.
The 128 Radeon HD 8000 cores on the X2150 run at 600MHz and can process two single-precision operations at the same time for a total of 154 gigaflops of math oomph. The double-precision math only comes in at 9.3 gigaflops – which is not great – but for video, seismic, facial recognition, and even life sciences workloads, single precision math is key. Moreover, other GPUs are terrible at DP math, too, excepting ones such as the Tesla K20 and K20X from Nvidia, which are designed explicitly to do DP math because some HPC workloads require it.
The issue then becomes how many of these Kyoto chips can you cram onto a card and feed with memory. With an HP "Redstone" Moonshot 1500 microserver enclosure, you could put about 277 teraflops into a rack (not counting the calculating capabilities of the CPUs, that is just the GPUs) at single precision at a cost of $178,000 for just the processors, or about $643 per teraflops at the GPU level. And if you want to be really fair, you have to multiply the cost of the X1150 by 1.9 divided by 2 (the clock speed ratio), which comes out to $60.80, and subtract that from the cost of the X2150, which comes out to $38.20, to figure out the cost of the GPU on the X2150 chip. When you do that, the GPU cost of a rack of Moonshot servers with 277 teraflops comes to $248 per teraflops.
For comparison, a Quadro K5000 card from Nvidia, based on the K10 graphics chip that is designed for SP work, is rated at 2.1 teraflops single precision. It costs $2,249, which works out to $1,071 per teraflops for just the GPU card. ®
Sponsored: Benefits from the lessons learned in HPC