Feeds

AMD rides Bulldozers into the x86 server chip war

Opteron 6200s AND 4200s enter field against Xeon E5s

Internet Security Threat Report 2014

SC11 Advanced Micro Devices was expected to launch its "Interlagos" Opteron 6200 server processors about now in conjunction with the SC11 supercomputing conference in Seattle.

But what wasn't known was that AMD was going to kick out the entry eight-core "Valencia" Opteron 4200 processors now, too, rather than do a two-step launch.

AMD took a two-step approach with its prior generation of server chips, rolling the twelve-core "Magny-Cours" Opteron 6100s, the big guns, onto the field of the ongoing x86 server chip war in March 2010 for two-socket and four-socket servers and following up with the six-core "Lisbon" Opteron 4100s for machines with one or two sockets in June 2010. The Opteron 6100s got a deep bin sort and a speed boost in February this year and otherwise it has been all quiet on the Opteron front.

AMD has been giving the Opteron 4100 and 6100s air support before they entered the field, talking about the new design of the "Bulldozer" core and how it will make for better server chips that can meet a widening array of workload, performance, and thermal requirements.

AMD Bulldozer core block diagram

The Bulldozer core: share some things and reduce power draw

The Opteron server chips using the Bulldozer cores are implemented in GlobalFoundries' 32nm, 11-metal layer, high-k metal gate, silicon-on-insulator wafer-baking processes. The former AMD foundry, which was spun out three years ago, has had some trouble ramping up this 32 nanometer process, giving AMD headaches and also meaning it could not meet demand for the PC and server chips based on the Bulldozer cores.

As El Reg explained in detail earlier this year when AMD's techies divulged some secrets about the core design at the IEEE's International Solid-State Circuits Conference, the Bulldozer core module has some components shared across two cores, but also gives each core its own thread (with no simultaneous multithreading). AMD refers to this as having "two strong cores" in contrast to the HyperThreading virtual cores Intel puts in its Core and Xeon processors. Each core – which means an integer unit and a floating point unit – in the Bulldozer module has its own integer unit scheduler and L1 data caches, but the cores share fetch and decode units as well as a floating point scheduler and L2 cache memory.

Each integer unit in each Bulldozer core has four pipelines, capable of executing one instruction per cycle. A Bulldozer core module has two 128-bit floating point units, which can do two 64-bit double-precision operations per clock or four 32-bit single precision operations. If one core is not using its floating point unit during a cycle, then the other core can take all 256 bits and do four double-precision or eight single-precision ops in a single clock cycle.

AMD was originally calling this feature an AVX mode, but is now on announcement day calling it Flex FP. Flex FP does support AVX operations. The floating point unit has new multiply-accumulate functions and also supports a bunch of new instructions, including SSE3, SSE4.1, and SSE4.2 SIMD extensions, on-chip AES encryption/decryption, and PCLMULQDQ, which is used to perform a carry-less multiplication of two 64-bit integers. AMD has also added new instructions called XOP and FMA4, which are tweaks to 128-bit SSE5 and SIMD instructions that is more compatible with Intel's AVX implementations.

The Bulldozer module has 2MB of L2 cache memory and has a total of 213 million transistors; it has an area of 30.9 square millimeters and is designed to run at between 0.8 to 1.3 volts. Each core in the Bulldozer module has 16KB of data cache and there is 64KB of shared instruction cache per module. The module has 1MB of L2 cache per core (twice that of the prior Opteron 4100 and 6100 chips), and the four-module chip package has a third more L3 cache per chip, at 8MB.

The Bulldozer cores have a new memory controller that can support up to 384GB of memory per socket (up from a too-skinny 128GB with the prior controller) as well as DDR3 memory running at 1.6GHz. AMD says that the new controller can support load-reduced (LR-DIMM) main memory, which allows more memory chips to be packed onto a memory stick, and 1.25 volt (ultra-low-volt) memory will also be supported in addition to the 1.5 volt (regular) 1.3 volt (low-volt). The new memory controller has "aggressive power down" and "partial power down" settings as well as memory power capping to keep systems within the thermal envelopes set by administrators.

Here's what the Bulldozer module looks like:

AMD Bulldozer core module zoom

To make an eight-core Valencia Opteron 4100, you put four of these Bulldozer modules on a single piece of silicon and wrap them up with a shared DDR3 main memory controller and 8MB of L3 cache, like this:

AMD Bulldozer Valencia Opteron chip

To make a 16-core Opteron 6200 processor, you put two of these in a single package, like this:

AMD Opteron 6200 die

AMD's double-stuffed Opteron 6200 processor (click to enlarge)

The one thing that the new Opteron processors do not have is support for PCI-Express 3.0 peripherals, either on the chip itself or in the chipset. The forthcoming "Sandy Bridge-EP" Xeon E5 will have on chip PCI-Express 3.0 controllers, as El Reg revealed back in May.

"If you ask our competitor, PCI-Express 3.0 is a big deal," says John Fruehe, director of product marketing for servers and workstations at AMD. "If you ask anyone else, it doesn't make a stinking difference. The important thing is that PCI-Express 3.0 forces a platform change that only benefits a few select applications. We'll be there when it is relevant. For us, it is more important to time it right than to be first to market."

That is precisely why AMD didn't rush to support DDR3 main memory with the Opterons, or goose the memory controllers with more capacity.

The Interlagos chip has a total of 2.4 billion transitions, which means the Valencia chip has 1.2 billion.

Beginner's guide to SSL certificates

Next page: The x86 lowdown

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.