AMD laughs at Intel with Opteron Bulldozers
Have fun with that socket upgrade, boys
There is no such thing as the last laugh in the server chip business. But you can get the next laugh, and Advanced Micro Devices thinks it is going to get that sequential chuckle on rival Intel in the x64 server racket with next year's launch of the "Bulldozer" family of Opteron processors.
The reason is simple. This year, AMD has struggled through a major socket upgrade and server platform redefinition, including its first homegrown chipsets (if you consider ATI homegrown), while the world was trying to recover from the Great Recession. Intel's march 2009 rollout of its substantially revamped Nehalem architecture with the two-socket Xeon 5500s might have been pushed out a bit because of technical issues, but the need to do virtualization to cut costs during the recession and the substantially improved Nehalem design allowed Intel to weather the recession pretty well in addition to stealing back some market share from AMD.
The next battle in the x64 server war, which AMD started in early 2009 to stay in the game and which it has continued with its "Lisbon" Opteron 4100s and "Magny-Cours" Opteron 6100s in early 2010, is shaping up for the late summer or early fall of 2011. That's when AMD will get its first Opteron server processors based on the Bulldozer core to market. It is also about the same time that we expect to see "Sandy Bridge" Xeon processors for servers in the field.
That's nearly a year away, and in the meantime, AMD is going to have to rely on offering better bang for the buck on the current six-core Opteron 4100s (for uniprocessor and dual-socket boxes) and twelve-core Opteron 6100s (for 2P and 4P machines) relative to the current six-core Xeon 5600s (for two socket boxes) and eight-core Xeon 7500s (for four socket and larger machines).
If AMD had a Facebook page for the current crop of Opteron processors, its friend map would look like this:
Acer – which bought Gateway and is a server wannabe and also-ran like Gateway was for a decade – came out swinging at the Opteron 6100 debut, and Dell and Hewlett-Packard have shown some enthusiasm for the chips. The Opteron 4100s, while offering compact and clever motherboard and systems options, were less enthusiastically adopted. IBM seemed to have to be bound and dragged back to the Opteron side of the field after putting its eX5 chipset for Intel's high-end Xeon 7500s out for blades and racks and doing the easy Xeon 5600 refresh on existing products.
Oracle, which acquired once-Opteron-loving Sun Microsystems in January, took the Sun Fire and Sun Blade Opteron products and is now using them as a boat anchor on Larry Ellison's America's Cup yacht. The uptake by server partners for the current Opterons has been slow because of the chipset and socket changes required by server makers, who were understandably stingy during the downturn and annoyed with AMD over Opteron delays from a few years back.
With Intel getting its own server chip design act together with the Nehalem family, server makers focused their engineering in 2008, when the global economy went into the crapper, on the easy sell for 2009. AMD has been suffering since then, and many believe it will continue to suffer despite demonstrable price/performance advantages for its chips.
According to Pat Patla, vice president and general manager of AMD's server and embedded products unit, that other x64 chip maker - the one that brought you 64-bits, integrated memory controllers, and multicore processors first - is spoiling for a new fight in 2011 with the Bulldozer-based chips, which at AMD's Financial Analyst Day this week he characterized as "a whole new approach to the ISA" and as the biggest architectural change that AMD has made with its chips in a decade.
El Reg walked you through the finer points of the Bulldozer architecture last December  and gave you an update on their expected performance in August of this year . Without going over all the same details again, the Bulldozer concept is to design a chip that is halfway between cookie cutting whole computing elements and putting them on a die (as AMD does) and virtualizing instruction streams and threading across the virtual pipelines to boost performance (which is what Intel does with HyperThreading).
Intel, in fact, uses both techniques - cookie cutting and virtual threading (which is known more generically as simultaneous multithreading) - in its Xeon and Itanium chips; IBM uses similar techniques with eight-core, 32-thread Power7 chips and Oracle does likewise with its 16-core, 128-thread Sparc T3 chips.
With the Bulldozer chips, which are implemented in GlobalFoundries' 32 nanometer technologies, AMD wants to do what it calls "two strong threads," as the illustration below shows:
The Opteron Bulldozer core: Two strong threads, no HyperThreading
Each core - which means an integer unit and a floating point unit - has their own integer unit scheduler and L1 data caches. Just like a single-core CPU did and the cores on multicore processors have today. But the cores share fetch and decode units as well as a floating point scheduler and L2 cache memory. The Bulldozer modules are cookie-cuttered in two-core units, and the future "Valencia" Opteron 4200 chip will be four of these modules with a shared memory controller, L3 cache, and northbridge spanning the four modules and eight cores. Each integer unit has four pipelines, capable of executing one instruction per cycle.
Each Bulldozer module has two 128-bit floating point units, which can do two 64-bit double-precision operations per clock or four 32-bit single precision operations. What is neat about the Bulldozer design is that either "core" in the module can grab the scheduler and if the other core is not doing floating point, then it can take all 256 bits and do four double precision or eight single precision ops in a clock using what AMD is now calling an AVX mode.
At the meeting this week, Patla gave out a little more detail on the "Interlagos" Opteron 6200s, which will be two Opteron 4100s in a single G34 socket, and the Opteron 4200s, which will plug into the C32 socket (a modified version of the Rev F socket). Here's the before and after:
As you can see, the Bulldozers will have double the L2 cache, at 1 MB per core, plus a 33 per cent bump in L3 cache memory, to 8 MB per die. So on Interlagos chips, which put two Valencia's on a package, the total L3 cache per socket will be 16 MB. The future Opterons will also have a Turbo Core mode that allows them to bump up their clock speed as workloads dictate by as much as 500 MHz, even when all of the cores on the chip or in the socket are being used to do work. (To use Turbo Core mode on Intel chips, you have to shut down all the cores but one on the chip, and you only get a nominal increase in clock speed during the time the cores are sleeping).
As AMD said earlier this year, the Bulldozers, socket for socket, are expected to offer about 50 per cent more oomph with a 33 per cent increase in core count over the Opteron 4100s and 6100s - and do so in the same power bands of 65, 80, and 105 watts. Patla held to that performance increase this week at the financial analysts meeting. And he also provided some insight in how this will be accomplished.
First, the memory controller on the Bulldozer Opteron processors will support 1.6 GHz memory, boosting clock speed by 20 per cent. The systems will be able to support load-reduced DIMM (LR-DIMM) DDR3 main memory, which allows more memory chips to be packed onto a memory stick, 1.25 low-volt memory will also be supported. The new memory controller will have "aggressive power down" and "partial power down" settings as well as memory power capping to keep systems within the thermal envelopes set by administrators. When you add all the memory changes up, Patla says the overall memory performance on the Bulldozer Opterons will be boosted by around 30 per cent.
On the floating point front, the Interlagos chip will be able to 64 flops per cycle on the sixteen-core variant, which Patla estimates will be equivalent to the fastest Sandy Bridge Xeon Intel is expected to field next year. The six-core Xeon 5600s do 24 flops per cycle and there is a less-cored version of Sandy Bridge that will do 32 flops per cycle. The current twelve-core Opteron 6100 chip can do 40 flops per cycle.
With the upgrade to Interlagos being only a chip swap and a BIOS upgrade, Patla is confident that "HPC customers are going to open up the box to do those upgrades." Most enterprise customers don't upgrade their chips but rather replace their servers - with the exception of high-speed trading companies, which can make the cost of such an upgrade back in a short time if it gives them an edge on the stock market.
Those who want Interlagos and Valencia Opterons are going to have to wait a while.
"The product has taped out and we do have early silicon in the labs," said Patla. "We do have partners with silicon and we will be doing mass sampling in the Q4 timeframe. Production is expected to begin in Q2 and we expect to launch and have widespread availability in Q3 2011."
The Interlagos high-end variants of the Bulldozer Opterons will come first, with the Valencia low-end variants in the "high-low" scheme coming 60 to 90 days later.
It would have been more interesting if AMD could get these chips in the field ahead of Intel's Sandy Bridge Xeons, and you have to think that this was, indeed, the plan. You can sure bet that Intel will be trying to jump the gun on AMD with its 22 nanometer processes and the Sandy Bridge Xeons if at all possible. And just to make it interesting, maybe GlobalFoundries, AMD's wafer baker, can do a better job on the 32 nanometer ramp. It will be an interesting summer next year for servers, that's for sure. ®