Original URL: http://www.theregister.co.uk/2011/04/04/amd_opteron_server_update/

AMD gases up Bulldozers for Intel push back

Can Intel match 16 cores at 3.5GHz?

By Timothy Prickett Morgan

Posted in Servers, 4th April 2011 03:00 GMT

Advanced Micro Devices is in a number of tight spots at the moment, but the company is hopefully optimistic that its future "Bulldozer" Opteron processors due later this year will let it dig in and grab some desperately needed - and profitable - server market share from archrival Intel.

The first Bulldozer Opteron chips, the 16-core "Interlagos" processors, are still on track for production in the second quarter with a formal launch sometime in the third quarter, Vlad Rozanovich, director of the enterprise and public sector business at AMD, tells El Reg. These chips will be used in servers with two or four processor sockets and will plug into existing machines that use the G34 socket and that currently sport the 12-core "Magny-Cours" Opteron 6100s.

At some time after the Interlagos launch, AMD will kick out the eight-core "Valencia" processors for servers using the C32 socket, which currently use the six-core Opteron 4100 processor. The C32 machines are designed for lightweight, low-power, hyperscale workloads and come in variants with one or two sockets. The Interlagos and Valencia chips are being made by GlobalFoundries in AMD's former Dresden fab using 32 nanometer processes.

Intel is not sitting around waiting for AMD to launch its 32 nanometer chips. The chip giant has already spilled the beans on its Xeon E3-1200 processors, based on the "Sandy Bridge" design, as part of its sudden excitement about single-socket micro servers. And Intel is expected to bring its top-end Xeon E7 parts, based on the prior "Westmere-EX" design, to market soon, with a launch expected this week.

That leaves Intel with the belly of the market wide open until it can get the Xeon E5 processors for two-socket boxes to market, with these expected to be launched in the third quarter of this year - about the time the first Bulldozer Opterons appear.

That is a long time and a lot of marketeering before the Xeon E5s and Opteron 6200s (if that is indeed what AMD will call them) face off in the market on general-purpose two-socket boxes.

AMD thinks that it will have an advantage in the x64 core wars, and that its processors will be a better fit than current or future Xeons for more workloads than a lot of people might expect. Some of these workloads you would expect, such as virtualized server instances on public or private clouds, while others you might not, such as Monte Carlo simulations done by financial institutions.

The drum that AMD has been beating for years, and which will get all the more loud as Bulldozer chips get closer, is that Intel's HyperThreading, which creates two virtual software threads for each core on a Xeon chip, is not as good as having two actual cores doing work. AMD thinks that the core design of the Bulldozer architecture, which puts two cores with some shared elements across those cores into a module and then puts multiple modules on a single chip, is better than Intel's approach, which puts fewer cores on a die but which adds HyperThreading and using a ring structure to link the cores together. (The Bulldozer design approach was detailed here, while the Sandy Bridge and Westmere-EX designs are here).

AMD is basically betting that having 16 cores running at 3.5GHz and above for G34 servers and eight cores for C32 server is better than ten hyper-threaded cores on Westmere-EX or eight hyperthreaded cores on Sandy Bridge-EP processors. Intel has not discussed clock speeds for the Sandy Bridge-EP (Xeon E5) or Westmere-EX (Xeon E7) processors, but AMD could have a clock advantage on some parts as well as a core count advantage.

This is an interesting bet for AMD to have made, and if it turns out that AMD can deliver more cores and at the same and equivalent clock speed with better or equal thermals and performance at a better price, then AMD's board of directors will have some questions to answer about the firing of Dirk Meyer in January.

The big five ride

None of the top-five server makers hurt themselves getting their Opteron 4100 and 6100 products out the door last year, but Hewlett-Packard and Dell fielded a reasonably wide range as did server upstart Acer, all of which will be field upgradable to the Bulldozer chips this year. IBM put a single four-socket box out that crammed all the electronics and lots of memory into a 2U space, but did not put out any other machines, be they rack, tower, blade, or tray servers. Oracle has stopped making Opteron boxes in the wake of acquiring server maker Sun Microsystems in January 2010.

Rozanovich tells El Reg that the shift from dedicated server hosting to cloudy public infrastructure will play into AMD's favor, as will the increasing use of server virtualization inside the world's data centers.

"Three years ago, when companies outsourced their workloads, they wanted a real physical server," explains Rozanovich. "Now, with virtualized servers, people don't really care if they have a dedicated server. What they care about is a rock-solid service level agreement and the ability to expand and contract their workloads and control their costs." And, the hosters want to set up a more standardized infrastructure stack that lets them achieve efficiencies they could not with dedicated hosting. (All of those unused clock cycles, disk spins, and memory chips devoid of data are wasted money, sitting on the books, making the CFO grumble.)

If the success that AMD has had with the Opteron 6100s in certain hosting and HPC accounts is any indication, then AMD thinks it has a pretty good shot at a revival in its server biz thanks to what Rozanovich calls "straight through computing."

Take server virtualization for example. At cloud providers, their virtualized systems are now running at 80 to 90 per cent CPU utilization these days, according to Rozanovich, much higher than the 5 to 20 per cent utilization a typical x64 server had running a single workload. "When your CPU is running at that high utilization rate, HyperThreading doesn't work," says Rozanovich. "The system doesn't have time or the capacity to hyperthread. So now, people running normal virtualized server workloads are turning off HyperThreading on their Xeon-based servers, just like supercomputer shops have been doing for years."

For virtualized servers, having more cores in an Opteron box compared to an equivalent Xeon box gives AMD a slight advantage because companies tend to pin a virtual machine to a core, not a thread. So if AMD's server partners can put 48 Opteron 6100 cores in a 2U box compared to 32 cores for a Xeon 7500 machine - both with big wonking memories - AMD wins. (Of course, last year AMD lost the memory capacity war because its Opteron 6100 memory controllers topped out at 512GB compared to 1TB or sometimes 2TB for Xeon 7500 boxes. The Interlagos Opterons have a reworked DDR3 memory controller that will sport terabytes of main memory.)

The reworking of software to take advantage of more cores and threads is also helping AMD as much as it is helping Intel and suppliers of other processors. To illustrate how the strong core philosophy of AMD's chip design is panning out, Rozanovich uses Monte Carlo simulation, which is used by financial institutions to value their stock and bond portfolios and help them make trades as the markets move.

"The old rule for Monte Carlo was that the fastest frequency and the lowest latency always wins," says Rozanovich. "And so in 2005 and 2006, AMD won the majority of the Monte Carlo deals." Particularly when power was factored in the equation. While data centers can often cope with a rack of super-dense servers that burn 25,000 to 30,000 watts of juice, the data centers near Wall Street, the City of London, and other financial centers can typically only supply 9,000 watts per rack. So every watt and clock really counts.

But with the "Nehalem-EP" Xeon 5500 chip launch back in March 2009, AMD lost the clock and latency edge, thanks to the low-power Nehalem core and the QuickPath Interconnect for linking cores together out to main memory and peripherals. But thanks to the 18 to 24 month upgrade cycle that financial institutions have for their simulation platforms, AMD started to get traction with the dozen-core Magny-Cours chips from the Monte Carlo crowd - and this business is still building.

"Magny-Cours is getting an adoption rate that we haven't see in a while," says Rozanovich.

Part of the reason, he says, is that programmers at financial institutions are learning how to program in parallel. "Some financial institutions running Monte Carlo simulations are now getting much better performance using the slower 12-core Magny-Cours than the faster eight-core versions," Rozanovich tells El Reg. "As developers get experience with parallelization, they are going to start programming to the cores."

At financial institutions that are working on skinny thermal budgets, power consumption is driving what chips Intel and AMD design, according to Rozanovich. Some big banks, brokerages, and hedge funds are coming to them with power budgets and they are sitting down together to determine what number of cores at what clock speed they can deliver in a future chip. It isn't quite custom silicon, but both chip makers have to offer overclocking for high-speed trading systems and thermally conscious workhorses for Monte Carlo and other simulations. ®