Intel pushes workhorse Xeons to six cores
Go Westmere, young man
The first volley in the volume x64 server price war was officially fired today, with Intel rolling out its "Westmere-EP" Xeon 5600 processor. Rival Advanced Micro Devices is widely expected to counter with its "Magny-Cours" Opteron 6100 processors on March 29, to be followed by the long-awaited launch of Intel's "Nehalem-EX" Xeon 7500s on March 30.
The Xeon 5600s are the kickers to the very successful Xeon 5500s, the first server chips Intel got onto the field with the much-needed QuickPath Interconnect. QPI is important because it got processor cores and memory bandwidth back into whack after being out of kilter for years with the old Xeon frontside bus architecture.
With the Xeon 5600s, Intel is increasing the core count from four to six with the top-end parts, but the memory slots per socket remain the same. With 4GB DDR3 DIMMs being affordable and 8GB DIMMs being merely expensive instead of outrageous - as they were a year ago - Intel is counting on DDR3 DIMM capacities to make up for holding the memory slots constant. Moreover, it's also counting on server OEMs being thrilled that they merely have to drop the Xeon 5600s into the same machines they created to support the Xeon 5500s, since they are socket-compatible.
Intel stole a whole bunch of its own thunder for the Xeon 5600 launch back in early February , when it talked about the power gating and security features of the chip at the International Solid State Circuits Conference in San Francisco. The "transformational" Xeon 5500s launched  with much anticipation in March 2009 and provided a much-needed goose to the server racket that had been hammered into the ground by the economic meltdown. Intel and its partners are hoping that the Westmere-EP follow-ons can keep building momentum for x64 server sales.
The Xeon 5500s had two or four cores, 4MB or 8MB of L3 cache, and their 730 million transistors were implemented in 45 nanometer high-k process. Having perfected its 32 nanometer high-k metal gate processes late last year with desktop and laptop processors that were announced  in January, Intel is deploying the next rev of its 32 nanometer processes to make the Xeon 5600s. That 45-to-32 nanometer process shrink, combined with better power gating to core and now non-core parts of the chip (allowing for the quiescing of segments of the chip that are not in use), means Intel can boost the maximum core count to six and pump the maximum L3 cache size up to 12MB and still stay in the same thermal envelope.
The 32 nanometer, six-core Westmere-EP chip
The Xeon 5600 weighs in at 1.17 billion transistors and is 240 square millimeters in size. It is implemented in two halves of three cores each, as you can see. The core regions have their own clock speed and power supply, and with the tweaks to the Westmere design the L3 cache and memory controller regions - what Intel calls the "uncore" areas - get their own separate power gating. This allows Intel to be a whole lot more stingy about power usage with the Xeon 5600s.
As El Reg previously reported , the Xeon 5600s have had their on-chip DDR3 main memory controllers tweaked so they can support low-voltage DDR3 main memory. This low-voltage memory runs at 1.35 volts instead of the 1.5 volts of standard DDR3 chips, and the net effect is that memory DIMMs run about 20 per cent cooler when using the low-voltage parts without sacrificing performance, Intel said back at ISSCC, but now the company is only claiming a 10 per cent savings in power.
The Xeon 5600 processors, Intel divulged back in February, have a set of native cryptographic instructions that implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data. But in a conference call with journalists, Boyd Davis, general manager of marketing for Intel's Data Center Group, said that the company has also grabbed its Trusted Execution Technology (TXT) security features from the vPro business PC platform and hardened it so it can be used to secure virtualized server environments. Specifically, the TXT functions built into the Xeon 5600 platform can be used to prevent the insertion of malicious software prior to the launching of the hypervisor when a machine boots.
Here's how the Xeon 5600s stack up, and how they compare to the Xeon 5500s and 3400s that have not been replaced in the lineup:
The current Intel one-socket and two-socket server and workstation chip lineup
As with the Xeon 5500s, not every feature is enabled in every chip. In the table above, TDP is Intel's thermal design point rating, in watts. TB is short for Turbo Boost, which is a feature of Xeon and Core chips that allows some cores to run faster when other cores are turned off. HT is short for HyperThreading, which is Intel's implementation of simultaneous hyperthreading and which makes a single core look like two cores as far as the operating system is concerned. AES is short for the AES encryption instructions, and TXT is short for the Trusted Execution Technology, both of which are new with the Xeon 5600s. The number of cores and threads activated in the chip are also shown, as is the unit price for each chip when the chips are bought in 1,000-unit trays from Intel.
Banging heads against the thermal wall
There are a couple of things to notice about the Xeon 5600 lineup. First, while Intel is offering four-core variants of the chips with slight clock speed advantages, those extra clocks presumably do not give all that much extra performance. That a Xeon X5677 with four cores sharing that 12MB L3 cache, or 3MB per core, runs at 3.46GHz, compared to a six core X5680 (with only 2MB of cache per core) running at 3.33GHz illustrates why all chip makers have hit the thermal wall and have little choice but to increase core counts and hope that virtualization and workload consolidation put off the day of reckoning that is surely coming when programmers cannot make use of the extra cores and threads in a chip.
As with the Nehalem-EP chips a year ago, the Westmere-EP chips includes a single-core workstation and server part rated at 130 watts (the W3680) and low-wattage parts. The Nehalem-EPs had standard parts that burned at 80 watts or 95 watts, plus one 130 watt part and two low-voltage parts with four cores that burned only 60 watts. With the Westmere-EPs, the two 130 watt parts are now standard server parts, with the X designation, not the W designation meaning workstations only.
There are similarly 95 watt and 80 watt Westmere-EPs with four or six cores as well as the entry-level Nehalem-EP parts with two or four cores that are still in the lineup for two-socket machines. With the Westmere-EPs, there is one 60 watt part, the L5640, that has all six cores fired up that runs at 2.26 GHz, and a four-core L5630 with only four cores running at 2.13 GHz that is rated at 40 watts. The even cheaper L5609 chip, also a 40 watter with four cores, doesn't have TurboBoost, HyperThreading, AES, or TXT activated, and runs at 1.86 GHz.
Boyd said in the call that the pricing for the Xeon 5600 processors was roughly the same as what Intel is charging for an equivalent clock speed and feature set (minus the cores and cache, of course) in the Xeon 5500 line. This is not precisely true. Intel is most definitely charging a slight premium for some Xeon 5600s, and is no doubt justified in doing so based on the multithreaded performance that the chips offer - somewhere between 20 and 63 per cent on various HPC workloads and roughly 40 per cent for more mainstream infrastructure workloads running predominantly on Windows and Linux operating systems. Some Xeon 5600s - those that have only four cores - are cheaper than their Xeon 5500 counterparts.
Let's take the top-end parts. A four-core Nehalem-EP W5590 running at 3.33GHz costs $1,600, but the X5680 running at the same clock speed with six cores costs $1,663, a 3.9 per cent premium. (The four-core Westmere-EP X5677 with only four cores turned on and running at 3.46GHz costs the same $1,663.) The mainstream high-end four-core Nehalem-EP part in the 95 watt power band is the X5570, which spins at 2.93GHz and which costs $1,386. The X5670 has six cores and costs $1,440 - again a 3.9 per cent premium. The top-bin 80 watt part in the Nehalem-EP lineup was the E5540, running at 2.53GHz and costing $744; the equivalent Westmere-EP part, the X5650, spins at a slightly higher 2.66GHz and costs $774, which is a 4 per cent premium for 5.1 per cent faster clocks. (Seems reasonable.) The four-core Westmere-EPs known as the E5630 (2.53GHz) and E5620 (2.4GHz) have all the extra goodies on them and cost $551 and $387, respectively; the most similar 80 watt Nehalem-EP parts, the E5540 and E5530, cost $744 and $530, respectively. So basically, Intel has shifted the prices for these processors down one bin level with some wiggling.
Another thing to notice from the table above: the six-core E5645 and L5638 as well as the quad-core L5618 and E5620 processors are designated as enterprise-class embedded processors, which are aimed at thermally constrained physical environments. Intel is promising to sell these processors for seven years to get embedded system makers to adopt them in their products.
Intel is also chasing the micro-server segment and has pumped out a two-core, four-thread Xeon L3406 processor, which runs at 2.26GHz and is rated at 30 watts. It costs a mere $189 a pop if you buy them in 1,000-unit trays.
Finally, Intel is also delivering the Core i7-980X Extreme Edition processor, which the company has been showing off  to the gaming community. The i7-980X sports six cores running at 3.33GHz and plunks into existing machines that use the Intel X58 Express chipset. It costs $999 each in 1,000-unit quantities. ®