AMD's Opteron 4100s march into x64 price war
Battle for the clouds
The big guns are already on the field in the x64 server processor war, and the troops are finally going all in with today's launch by Advanced Micro Devices of its entry "Lisbon" Opteron 4100s.
The Opteron 4100s are similar in many respects to the "Magny-Cours" Opteron 6100s that entered the battle against Intel's "Westmere-EP" Xeon 5600 and "Nehalem-EX" Xeon 7500 processors in March. Both chips are implemented in a 45 nanometer silicon on insulator process and manufactured by GlobalFoundries, the chip foundry that AMD spun out last year.
The Lisbon cores are quite similar to those used in the six-core "Istanbul" Opteron 2400 and 8400 processors from a year ago, with the transistor counts and processor areas being essentially the same, as are the cache memories. The big change is the shift from DDR2 to DDR3 memory for the integrated memory controllers. The Istanbul chips already supported HyperTransport 3 (HT3) point-to-point interconnect links, although the chipsets that the Istanbuls talked to did not. (They had a backwards compatibility mode).
The Magny-Cours chips, which come in variants with eight or twelve cores per socket, basically cram two Lisbon processors side-by-side in a single chip package and slap them into a new G34 socket that is serviced by AMD's own chipsets. The Lisbon chip comes with either four or six cores per socket on a single die. Both AMD processors have 64 KB of L1 data and 64 KB of L1 instruction cache per core, plus 512 KB of L2 cache per core. The Lisbon chip has 6 MB of L3 cache per processor package, and the Magny-Cours, being a double-stuffed socket, has 12 MB of L3 cache per socket.
The Opteron 6100s are aimed at standard platforms that need lots of cores, clocks, and memory to support big databases or server virtualization hypervisors that in turn have lots of virtual machines running atop them. On these machines, raw performance and performance per watt are the two key metrics. Which is why AMD created the Opteron 6100s to support both two-socket and four-socket servers with the same chips and chipset.
Although, if you think about each physical chip as its own processor and look at the HyperTransport links coming into each socket, you could argue that what AMD has really done with the Opteron 6100s is crunch an eight-socket box into four sockets and a four-socket box into two-sockets.
This is a great strategy for customers who pay for their software based on socket count – and if you want to make a cheap four-socket box that can compete against Intel's more expensive Xeon 7500s. These latter chips from Intel have Itanium prices and scalability, but Xeon instruction set compatibility, and AMD figured that by crunching down the socket count on Opteron processors by a factor of two by doubling up the processors for the Opteron 6100s while at the same time creating an inexpensive, low-power server lineup for single- and dual-socket servers using the Opteron 4100s was the best way to undercut both the Xeon 5600s and the Xeon 7500s. The market will decide if this was indeed the right move.
What AMD has said this year, and what is no doubt true, is that it needs to get more market share in the server racket and it has to compete on price and performance to do that. However, it may just be that what many server customers want more than anything now is memory expandability within a 2P or 4P box, and the memory controllers inside the Opteron 4100 and Opteron 6100 processors can't address more than 512 GB, and a number of server OEMs, wanting to catch the server virtualization wave, have put two-socket and four-socket Xeon 7500 machines into the field that address 1 TB or more of memory.
AMD shops will have to wait until next year with the "Bulldozer" cores and their new on-chip memory controller to have the memory addressing for the Opterons be increased beyond 512 GB.
Memory capacity could be a problem for the Opteron 6100s, but it is not really so much of an issue for the Opteron 4100s. These Lisbon chips are aimed at scale-out, web-ish, HPC workloads where the cost per performance per watt, the physical size (smaller is better), and the lowest cost of acquisition and operation for a server node are the most important architectural factors. Because the Lisbon sockets have fewer cores, the clock speeds can be jacked up higher, which means better performance per core on certain kinds of workloads (particularly those doing floating point calculations) compared to the Opteron 6100s.
904 million transistors
Here's the block diagram of the Opteron 4100 chip, with the top of the chip package shown as an insert:
Here's what the chip itself looks like, and you would probably be hard-pressed to tell the difference between a six-core Istanbul from last year:
The Opteron 4100 has 904 million transistors implemented on its die, which has an area of 346 square millimeters. The chip has integrated memory controllers on each core, which support DDR3 main memory in both standard 1.5 volt or low-voltage 1.35 volt sticks. Memory can run as fast as 1.33 GHz. Each 1,207-pin C32 socket has 21.3 GB/sec of memory bandwidth over two memory channels and has two x16 HT3 links for up to 6.4 gigatransfers per second (GT/sec) of point-to-point bandwidth. The Opteron 6100 has twice as many HT3 links and twice as many memory ports to double up its memory and interconnect bandwidth, which stands to reason given that the Opteron 6100 is really just two Opteron 4100s snuggling in a single 1,944-pin G34 socket.
There are nine Opteron 4100 processors, and one of them that was expected to be launched today – a 2.9 GHz standard part called the Opteron 4186 with a $455 price tag – did not make the cut. Here are the ones AMD is putting into the field:
The AMD "Lisbon" Opteron 4100 processors
As you can see from the table above, there are three different variants of the Opteron 4100s: standard, HE, and EE parts. In the Opteron 4100 lineup, the standard thermal envelope parts are rated at 75 watts using AMD's Average CPU Power or ACP test. The HE, short for Highly Efficient, versions of the chips have an ACP rating of 50 watts. The EE, short for Extremely Efficient, Opteron 4100s are rated at a mere 32 watts. The prices of the Opteron 4100s are much lower than those of the Opteron 6100s, and the thermals are better too, as you can see from looking at the Opteron 6100 table below:
The AMD "Magny-Cours" Opteron 6100 processors
In terms of price per clock, the Opteron 4100s are going to win out over the Opteron 6100s in almost every case, and that is so by design given the intended customer set: those building high-density twin rack servers or skinless servers, looking to maximize the number of server units in a brick and mortar or containerized data center.
Because low-heat and low-price are more important than anything with these customers, AMD is not creating a high-cost, high-wattage Special Edition (SE) part for the Opteron 4100 processors, as it did with the Opteron 6100s. Conversely, there is no Opteron 6100 EE part since customers buying four-socket servers tend to want to maximize performance.
According to Margaret Lewis, director of software product marketing at AMD, four motherboard makers are stepping up to make boards using AMD's chipsets and C32 sockets to support the Opteron 4100 processors. (Given that the C32 socket is not all that different from the Rev F socket, this should not be a big deal). Gigabyte, MSI, Super Micro, and Tyan are all doing Opteron 4100 boards with lots of power optimizations, and Super Micro is expected to also do bare-bones platforms as it always does with any new x64 chip.
Dell's Data Center Solutions unit, which makes custom servers for hyperscale data centers and which accounts for a sizeable portion of the company's server unit shipments every quarter, has committed to using the Opteron 4100 in its boxes, and upstart server maker Acer, which carried the Opteron 6100 banner into the x64 war back in March, is picking up the Opteron 4100 and committed to delivering power-optimized twin servers (which put two servers side-by-side in a single 1U chassis, or sometimes four units in a 2U chassis) and tower machines for small and medium businesses in the second half of 2010.
Power optimization means cutting out USB ports, serial ports, parallel ports, service processors, and any other unnecessary item from the system board to cut the power usage of the board. Lewis says that additional tier one server makers are expected to put out Opteron 4100 products "in the coming months."
The problem with focusing on the cloud market is that the companies building or using Opteron 4100 systems, as well as those based on earlier energy-efficient Opteron designs, are unwilling to talk about what they are doing. (Indeed, it took El Reg a year of pestering to finally wear Dell DCS down enough to admit it used Opterons in at least some of its bespoke cloud servers).
"What you're using in the servers tells competitors what you are doing to get an edge and reach higher levels of energy efficiency," says Lewis. "But what I can tell you is that we have at least 2 million Opteron processors in cloud providers today."
That's somewhere on the order of a million boxes – and that is a pretty good business, even for Intel, much less AMD.
The good news for customers buying or building Opteron 4100 systems today is that AMD is committing to the C32 socket and the future "Valencia" Opterons – presumably the 4200s – due in 2011 with six or eight cores will plunk right into existing C32 systems with the SR5600 series chipsets from AMD. ®