AMD revs Opterons up to 6300 for fat x86 servers
Drop-in-socket performance jump, perhaps a Q4 revenue jump as well
Customers using big ol' fat x86 servers didn't have much to jump for joy about this year. There just isn't a lot going on. But to make things interesting, AMD is now goosing the performance of its top-end parts with the launch of its "Abu Dhabi" Opteron 6300s, which sport the "Piledriver" cores that already debuted  in the FX Series of high-end desktop chips.
It is hard to say who made the decision first, but both AMD and Intel tore up their high-end server chip roadmaps sometime late last year. Well, more precisely, Intel trashed the last page and AMD threw the whole thing away.
Intel decided not to launch a "Sandy Bridge-EX" Xeon E7 part, waiting instead to boost its high-end CPUs during the "Ivy Bridge" generation, presumably next year. Intel finally let everyone know  that the Sandy Bridge-EX part was not coming to market when it hosted its Intel Developer Forum shindig in September.
But as El Reg showed back in May 2011  when we got our hands on the undisclosed Intel roadmap, the Sandy Bridge variant of the Xeon E7 was most definitely in the works, with a "Romley" Socket-R system using the Sandy Bridge-EX part in four-socket servers and "Brickland" server platform using an unknown socket and the "Ivy Bridge-EX" chip in machines with 2, 4, or 8 sockets. And as we all know, Intel instead quadrupled up the Xeon processors with the "Sandy Bridge-EP" E5-4600s and made a Romley server platform out of those.
AMD was planning  to etch a tweaked die with up to ten "Bulldozer" cores and cram new Opteron 6300 and 4300 chips into shiny new (and incompatible) processor sockets by around now. The "Sepang" Opteron 4300 was slated to have ten cores, and the doubled-up "Terramar" would put two Sepangs in a single package and thus make the Opteron 6300.
But then Rory Read came in as AMD's new CEO and assembled a new management team who realized this was not going to work. So in November 2011 or so, the company decided to embrace the new Piledriver core and put it into existing sockets and keep the Opterons at the same core counts with a modest clock speed boost. This plan was announced  in February 2012.
It is hard to say who blinked first here, but Intel and AMD are both somewhat relieved to be able to take their foot off the gas at the high-end, particularly now with Power7+, Sparc T5 and M4, and Sparc64-X processors not yet much of a threat at the top of the server racket. El Reg's guess is that Intel caught wind of AMD's change in plan, saw it had breathing room, and followed suit.
In the meantime, AMD hunkered down and goosed the clock speeds  on the existing Opteron 6200s in June of this year after doing a deep bin sort for parts that could spin higher, seeking whatever advantages if could field against the older Xeon E7s and the new Xeon E5-2600 and E5-4600 parts.
Die shot of the Opteron 6300
AMD has been working with the Open Compute open source server project, founded by Facebook, on the "Roadrunner" line  of Opteron servers based on the high-end G34 sockets. And as El Reg previously reported , Open Compute accidentally outted some of the feeds and speeds of the Abu Dhabi Opteron 6300 processors, while at the same time getting the launch date wrong (the docs said Q2 2012, which obviously did not happen and was not part of the plan). That Roadrunner server spec said to expect a modest 200MHz clock-speed boost for the Abu Dhabi chips compared to the "Interlagos" Opteron 6200s.
If you ignore those faster Opteron 6200s that came out in June – the Opteron 6284 SE and 6278 – then all of the Opteron 6300 parts in the various voltage and core count options are indeed 200MHz faster. Here's how the Opteron 6300s line up against the Opteron 6200s they replace:
Stacking up new and old Opteron server chips
Generally speaking, AMD is raising prices a bit SKU-for-SKU between the Opteron 6300 and 6200 lines, from 10 per cent at the low-end of the range (from the Opteron 6344 down to the Opteron 6366 HE), to 26 per cent on the Opteron 6348, and from 34 to 38 per cent from the Opteron 6376 on up to the Opteron 6386 SE part.
HE, by the way, is short for Highly Efficient, and means a low-voltage part that throws off less heat per clock than regular parts. SE is short for Special Edition and means a higher-voltage, higher-clock part that has maximum performance but throws off the most heat.
Based on the extra 200MHz, you might think that the Opteron 6300s may not be worth the dough when compared to the Opteron 6200s they replace. But the chips have other features in those new Piledriver cores that help boost performance on top of those extra clocks, and therefore – AMD hopes – will convince customers to shell out a little extra.
To start with, the Piledriver cores include four new instructions.
FMA3 is a floating point fused multiply add instruction that is used for vector and matrix math and polynomial calculations commonly used in physical, chemical, and quantum simulations as well as in digital signal processing. Suresh Gopalakrishnan, vice president and general manager of AMD's server business unit, says that this FMA3 instruction will be compatible with the FMA3 instruction that Intel will add with future "Haswell" and "Broadwell" processors.
BMI is a Bit Manipulation Instruction that both Intel and AMD support, and TBM, or Trailing Bit Manipulation, is one that only Piledriver cores have at the moment. Both BMI and TBM are used to simplify some of the more common bit-shuffling routines that are used in compressed databases, hashing routines, and certain arithmetic operations. The idea is to cut down on the number of clocks it takes to get this bit shuffling done.
The final new instruction, also supported by both AMD and Intel, is called F16c – and no, it is not a fighter-bomber that is cloud-ready but rather an instruction that converts 32-bit single-precision floating point values to half-precision 16-bit formats and is, according to AMD, used in certain multimedia apps. The F16c instruction was added to Intel's Ivy Bridge family of processors.
You can read about the new instructions in the latest AMD software-optimization guide .
New features in the Opteron 6300 processors
In addition to the new instructions, there are a slew of other nips, tucks, and tweaks that come with the Piledriver cores. The branch predictor has been improved, for example, and so have the schedulers for both the floating point and integer units. The L1 caches on the dual-core Opteron module have larger table lookaside buffers, and the load queue in the load/store unit has been tweaked to improve store-to-load forwarding. The data prefetcher has been goosed, and the L2 cache is a bit more efficient, as well. The DDR3 main memory sticks hanging off the Opteron 6300s can run at a maximum of 1.87GHz, up from 1.6GHz with the Opteron 6200s.
Add up instructions per cycle tweaks and clock speed bumps, and you are talking about something on the order of 7 to 8 per cent performance improvements on raw floating point and integer performance. With significant tuning with compilers, you should get performance improvements as high as 24 per cent. (That latter number is for Java performance as gauged by the SPECjbb2005 Java benchmark.)
As with the Opteron 6200s, the 6300s have a Turbo Core mode that lets the cores accelerate by as much as 500MHz if other components of the chip are not too hot, and as high as 1.3GHz if half of the cores are shut down.
Relative performance of old and new Opteron G34-class chips
If you do the math on a system using two Opteron 6278s compared to a machine with the new Opteron 6380s, the performance per watt is 40 per cent better. This is significant.
Interestingly, AMD is not talking up a VMark virtualization benchmark test result comparing old to new, given how the Opterons are well regarded for running hypervisors and virtual machines.
AMD Opteron 6300s versus Intel Xeon E5 processors
AMD is not shy about ranking its new Opteron 6380, the top-bin 115 watt parts with 16 cores spinning at 2.5GHz, against the Xeon E5-2690, the top-bin Intel Xeon E5-2600 processor with eight cores (but 16 threads) spinning at 2.9GHz and burning 135 watts. You would think that to be fair, AMD would match up its 2.8GHz Opteron 6386 SE, at 140 watts, against this E5-2690, but the point that AMD is trying to make is that the regular Opteron 6380 offers roughly the same performance on memory-intensive workloads popular in the HPC community as the E5-2690, but costs about half as much at list price for 1,000-unit trays.
AMD has been shipping the Opteron 6300s since September and will be getting revenue from these processors in the fourth quarter. Supercomputer maker Cray is already shipping the new Opteron processor in its XK7 supers, which marry Opteron processors and Nvidia Tesla GPU coprocessors, and Gopalakrishnan says that HP and Dell are ramping up to start shipping the new processors in their respective ProLiant and PowerEdge machines. All told, the Opteron 6300s will be available in over 30 different machines. ®