AMD unimpressed with Intel six shooter
Barcelona beats Dunnington - sometimes
As you might expect, Advanced Micro Devices is keeping its chin up amid the ticktocking it's taking from rival Intel in the server space. This week's launch of the six-core "Dunnington" Xeon MP processors - which plug into four-socket and larger machines - made a lot of noise for Intel and its partners, but AMD wants you to be realistic about the prospects for processors in this part of the server space.
It also wants everyone to understand that the Dunnington Xeons aren't out-gunning its quad-core "Barcelona" Opterons by all that much - if at all.
First, here's a little server space DNA to keep in mind. According to John Fruehe, manager of worldwide market development for AMD and a former Compaq and Hewlett-Packard marketeer, the basic distribution of x86 and x64 server sales, quarter to quarter and year to year, is like this: About 20 per cent of the machinery shipped is single-socket boxes, about 70 per cent is for two-socket boxes, and around 10 percent accounts for four-socket machines. That leaves just a tiny slice for eight-socket or larger x64 iron.
While the bigger boxes generate larger sales and lots of profits - which is why vendors even bother - big x64 iron is still a market that has yet to unseated RISC/Unix or proprietary iron. To be sure, AMD, Intel, and their respective partners have been trying to do this, but their efforts are often half-hearted since these same partners usually have a highly profitable non-x64 line to protect.
That server distribution has not changed all that much in the past decade or so, says Fruehe, but he does concede that the combination of virtualization and server consolidation is driving customers to buy bigger boxes than they might have only a few years ago.
This is true across all platforms, by the way. And mainframes and proprietary servers like IBM's AS/400 and its successors have already undergone the virtualization crunch (and their revenue drops in the past decade are due in part to the widespread adoption of virtualization and the consolidation of footprints).
But none of this is meant to imply that AMD doesn't like the four-socket server space. "While the eight-way server is not a volume market, and it is not growing dramatically, the four-socket server is still a real stronghold for us," says Fruehe. One of the reasons, he says, is that the HyperTransport interconnect and the integrated memory controllers of the Opteron architecture gives AMD a performance advantage, clock for clock and core for core, compared to Xeons.
Intel's front side bus architecture just doesn't scale as well - and this matters on bigger boxes. That's why the future "Nehalem" Xeons and "Tukwila" Itaniums will use the QuickPath Interconnect, Intel's riff on the Opteron interconnect, including on-chip memory controllers.
To make up for the shortcomings in the 1.07GHz front side buses of the Dunnington Xeons, Intel has used the 45 nanometer process to squeeze up to 16MB of L3 cache on the chip. This cache helps mask I/O bottlenecks, and so do the three 2MB L2 caches that are shared by core pairs on the chip. But even with that, Fruehe is not impressed.
"Intel added 50 per cent more cores and got what looks like a 31 per cent more performance on virtualization," he says. Of course, the average workload is seeing around 35 per cent more work being done, which is better than the virtualization benchmarks show, and databases are seeing as much as a 50 per cent boost according to Intel's test.
What AMD intends to focus on to sell Barcelonas against Dunningtons is heat. The "Tigerton" quad-core Xeon MP predecessor to Dunnington was implemented in 65 nanometer processes, and Intel could ship standard parts at an 80 watt thermal design point (TDP). Dunnington, using a 45 nanometer process that implies a cooler chip, actually has a 90 watt TDP because of all that extra cache and the extra two cores. The top-end six-core Dunnington chip, which runs at 2.66 GHz, is still a 130-watt part, just like the fastest Tigerton.
"The majority of the world looks at the top bin parts and tells vendors that these are great for benchmarks, but these are not the chips that most people buy," says Fruehe. As you might expect, these expensive parts run a bit faster, but a whole lot hotter. The standard parts are where AMD and Intel are competing for most sales.
What Intel does have and what AMD has not been able to get since the advent of the Opteron line is big iron based on its x64 chips. IBM, NEC, and Unisys have 16-socket Xeon boxes that could use Tigerton and now Dunnington chips, while the biggest commercial Opteron boxes are the eight-socket DL785 from Hewlett-Packard and X4600 from Sun Microsystems. Motherboard makers Tyan and Super Micro also sell eight-way machines and boards to OEMs.
AMD is also planning to remind customers that the Dunnington machines use Fully Buffered DIMMs, which consume 10 to 11 watts per memory stick, compared to the DDR2 main memory used with Opterons, which consume 4 to 5 watts per stick of the same capacity. All this heat adds up inside a big box.
While the current Barcelona Opterons, which top out at 2.3GHz, have to fend off the Dunningtons at the high-end and the Harpertown Xeons in the two-socket space, help is on the way. Fruehe reiterated that the "Shanghai" 45 nanometer shrink of the Barcelona chips are expected in two-socket (2000 series) and four-socket and larger (8000 series) servers in the fourth quarter of this year.
AMD will not only ship these Shanghai chips for revenue, but AMD's partners will have boxes in the field before the end of the year. The Shanghai Opterons will have 6MB of L3 cache (triple that in the Barcelonas), higher clock speeds, and various tweaks in the instruction stream that will deliver somewhere between 15 and 30 per cent more performance for Opteron customers. ®
Windows ve Linux
So true. I have read several reports that the typical IT department needs three times as many servers running that other OS as GNU/Linux. One thing these big chips with huge caches will do for GNU/Linux is to allow more users to run on a single GNU/Linux terminal server. With dual-socket-quad-core a fairly large school can run most of the desktops from a single server. That is performance. Hex/octal core chips will be able to do that with a single socket and we can use a second machine for backup for very little cost compared to a bigger cluster of servers.
Windows ve Linux
If you want to run the most powerful Windows based server then you really need compatible CPU cores like these. However if you base your server on Linux then your choice really opens up. Linux is compiled for pretty much every CPU available. PowerPC for example. CPU's that can be arranged in vastly multi CPU configurations. It always seems to me that the CPU makers have to work a lot harder to make their chips Windows compatible compared to how hard the Linux compilers have to work to support a new CPU.
Effectively with Linux (or Unix) you can simply design the most powerful CPU or cluster forgetting backward compatibility then put Linux on it. With Windows you have to design to the machine code already in existance.