Original URL: http://www.theregister.co.uk/2010/03/29/amd_opteron_6100_launch/

AMD draws x64 battle lines with 'Magny-Cours'

Opteron 6100s lock and load

By Timothy Prickett Morgan

Posted in Servers, 29th March 2010 05:02 GMT

With AMD's launch of its "Magny-Cours" Opteron 6100 processors today, another battalion in the x64 War of 2010 is moving into position, opposite the field from Intel's "Westmere-EP" Xeon 5600s. Tomorrow, Intel will roll out its big-gun "Nehalem-EX" Xeon 7500s, and in the second quarter, AMD will move its entry "Lisbon" Opteron 4100s into the front lines. The shooting will not wait until all the chips are in the field, of course, and this morning, the battle is already loud and smoky.

In case you were busy two weeks ago, Intel got onto the x64 battlefield first with its Xeon 5600s. Intel put out fifteen Xeon 5600 processors as well as the related Xeon 3600 for single-socket workstations and the Core i7-980X Extreme Edition for high-end PCs. The Xeon 5600 chips are socket-compatible with last year's quad-core Xeon 5500s, and they come with four or six cores and 12 MB of L3 cache spread across those cores, plus HyperThreading simultaneous multithreading and Turbo Boost, which allows cores in a chip to speed up as others in the chip are shut down.

The Xeon 5600s came with thermal design points (TDPs) of 40, 60, 80, 95, and 130 watts, and clock speeds ranged from 1.86 GHz to 3.33 GHz. Intel kept some entry Xeon 5500 parts in the lineup, which were missing HyperThreading and Turbo Boost as well as a number of other features, to give it entry performance and price points - presumably to shoot at the Opteron 4100s, due in the second quarter for single- and dual-socket servers.

The Opteron 6100s are the big guns in the AMD lineup, and they overlap with the new Xeon 5600s and the low-end of tomorrow's Xeon 7500s. The Xeon 5600s are aimed at two-socket boxes, and the Xeon 7500s were supposed to replace the Xeon MPs, aimed at four-socket and larger machines - when vendors actually got around to making servers with 8, 16, or 32 sockets. But a number of server makers are cooking up two-socket and four-socket machines based on the Xeon 7500s because of the extended memory they offer and because in a lot of cases, server buyers are more constrained by memory capacity and bandwidth than processing capacity. This is yet another way that the uptake of virtualization puts pressure on the chip and server makers.

With the market for eight-socket and larger boxes dwindling and the four-socket market expected to see a similar decline, it is no wonder that AMD decided to bifurcate its product line in a different way from the old Opteron 1000, 2000, and 8000 series and focus on creating platforms with one, two, or four sockets with two distinct processors and memory and energy profiles. Those looking for the densest and most energy efficient server platforms with one or two sockets are expected to go for the Opteron 4100s, while those looking for more cores and more memory per socket as well as up to four sockets.

AMD wants one set of chip cores and chipsets that get implemented in two slightly different packages to address the bulk of the market. Intel still has two distinct processors and chipsets. AMD wants to compete on price and ramp up its volumes.

As expected - and as finally confirmed by John Fruehe - director of product marketing for server and workstation products at AMD, the Opteron 6100 is really two of the upcoming six-core Lisbon Opteron 4100 processors implemented in a single socket. The combined chips get baked into a package with 1,944 pins that plug into the G34 socket. The Opteron 4100s will be sliding into a tweaked version of the Rev F socket, which has 1,207 pins. Both use organic land grid array (LGA) links between the processor and the socket. With both Lisbon processors and their 6 MB of L3 cache each, the total processor budget for the Opteron 6100 comes to 1.81 billion transistors and the combined dies have a total of 346 square millimeters per die, for a total of 692 square millimeters.

The Opteron 4100 and 6100 chips are both implemented in a 45 nanometer silicon on insulator process and are baked up by AMD's spun out chip biz, GlobalFoundries. The Lisbon cores are very similar to those implemented in the six-core "Istanbul" Opteron 2400 and 8400 processors last June. The transistor budgets and processor areas are the same, as are the caches. The big change is the shift from DDR2 to DDR3 memory for the integrated memory controllers.

Magny-Cours, clothed

AMD did not provide a picture of the naked Magny-Cours chips snuggling up to each other, but here's what it looks like from the outside:

Opteron 6100 Package

The Opteron 6100 package: two chips making the CPU with two backs

Intel will probably make a little fun of AMD in that it can't get a dozen cores onto a single piece of silicon and had to double up. But then again, no one else, including Intel, can get more than eight cores into a single package at this point.

Jamming two chips into one package and clocking them down to get more threads to chew on software is a trick that Intel, Hewlett-Packard, and IBM have all done to move their server lineups along, and if Oracle caught a clue, it would figure out how to get multiple "Rainbow Falls" Sparc T3 chips into a single package and jack up the performance of its own Sparc machines. Oracle could drop the clock speed from the expected 1.67 GHz to maybe 1.2 GHz but double up the cores from 16 to 32 per package and maybe boost the performance of the product line by another 50 per cent or so in its four-socket Sparc T5000 series machines.

The Lisbon chips, and therefore the Magny-Cours chips, have 64 KB of L1 data and 64 KB of L1 instruction cache per core, plus 512 KB of L2 cache per core. The Lisbon chip has 6 MB of L3 cache per processor, and therefore, the double-stuffed Magny-Cours have 12 MB of L3 cache per socket. The Lisbons have six cores, but in some cases, AMD is selling partial duds (as all chip makers do) with only four working cores. So the Magny-Cours therefore come with either eight or twelve cores activated. Here's the Opteron 6100 lineup:

Opteron 6100 Table

The AMD "Magny-Cores" Opteron 6100 processors

One thing you will notice. The Opteron 6100s come in standard, Special Edition, and Highly Efficient versions, as AMD has promised a number of times (see here for instance). But the bar has moved, again as it has in the past, for that standard, SE, and HE parts means. With the four-core and six-core predecessors to the Lisbon and Magny-Cours Opterons, the standard thermal envelope parts were rated at 75 watts using AMD's Average CPU Power or ACP test. The SE chips have slightly higher clock speeds and burned 95 watts while the HE versions slow down the clocks and drop the voltage to get down to 55 watts.

(The Extremely Efficient, or EE, Opteron parts were rated at 40 watts, but these are not going to be available in the Opteron 6100 packaging, and there will be no SE parts with the Opteron 4100s but there will be standard, HE, and EE parts).

With the Opteron 6100, the SE part is running at 105 watts, the standard part is running at 80 watts, and the HE part is rated at 65 watts. So there has been, once again, some watt creepage in these definitions, as there was in the jump from dual-core to quad-core Opterons.

Intel was talking up the fact that it had embedded cryptographic instructions in the new Xeon 5600s to implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data. Opterons will not get similar instructions until next year, with the "Bulldozer" cores.

HyperTransported

While all of that is interesting, the bigger and perhaps more important change in the move from the Opteron 2000 and 8000 series to the Opteron 4000 and 6000 series is the jump from three to four HyperTransport 3.0 (HT3) links in the point-to-point architecture that defines the Opteron. With the Direct Connect 1.0 architecture that defined prior Opteron machines, processors had integrated memory controllers and implemented a NUMA access method to reach into each others' memory when needed.

The processors had two memory channels per socket and could have eight DDR1 or DDR2 memory DIMMs per socket. The NUMA architecture implemented in the point-to-point interconnect had the processors linked to each other in a square, so a processor could talk directly to its immediate neighbors in a four-socket machine, but to reach its fourth partner in a machine, it had to route through its neighbors to get to that other member of the NUMA cluster. This added latency and slowed down performance.

With the Direct Connect Architecture 2.0 implemented for the Opteron 6100 machines, the processors now have a cross bar switch and all of the four sockets in the box are directly linked to each other, eliminating that extra hop. The sockets now have four memory channels per socket (double earlier machines) and can support a dozen DDR3 DIMMs (up 50 per cent).

This architecture, says Fruehe, is designed to scale up to 16 cores per processor, which is what is necessary to support the future 16-core "Bulldozer" cores in 2011. (See here for more on Bulldozer, which will plug into the G34 sockets used for the Opteron 6100s and the C32 sockets used for the Opteron 4100s).

According to Fruehe, the integrated DDR3 main memory controller on the Opteron 6100s can support up to 1.33 GHz DDR3 DIMMs, delivering up to 42.7 GB/sec of memory bandwidth per G34 socket. That's 2.5 times the memory bandwidth of the Istanbul Opterons. If you want to use low-voltage DDR3 memory modules, then they top-out at 1.07 GHz, which means you get 20 per cent less memory bandwidth, but save somewhere around 10 per cent on the memory power usage for DIMMs.

AMD is supporting 8 GB DIMMs now with the Opteron 6100 systems, which means a two-socket box can support 24 memory slots (192 GB) and a four-socket box can go to 48 slots (384 GB). Don't get too excited about 16 GB DIMMs, but when prices come down out of the stratosphere for these, perhaps late this year or early next, the memory controller on Opteron 6100 is limited such that four twelve-core processors can only address up to 512 GB. To address more memory than this per system will require a move to the Bulldozer cores.

Generally speaking, bin for bin, the twelve-core Magny-Cours chips provide about 88 per cent more integer performance and 119 per cent more floating point performance than the six-core "Istanbul" Opteron 2400 and 8400 chips they replace. These numbers are based on the SPECint_rate2006 and SPECfp_rate2006 benchmark tests, which pitted a six-core Opteron 2435 running at 2.6 GHz against a twelve-core Opteron 6174 running at 2.2 GHz.

Bootnote: The new Opteron 6100s do not have specific instructions for accelerating AES encryption and decryption, as AMD originally told El Reg. The chips, like other Opterons, can run AES software, but that is not the same thing as having specific instructions to accelerate it. The Opterons will get a feature similar to the AES-NI instructions that came out out with the "Westmere-EP" Xeon 5600s two weeks ago when the "Bulldozer" cores ship next year. ®