Xeon E7 servers run with the big dogs
Gives chase to RISC and Itanium foxes
Deep Dive Intel has come a long way in the server racket, and the new "Westmere-EX" Xeon E7 processor, launched in April and making its way into systems now, is arguably its most sophisticated processor for servers to date.
The Xeon E7 processors cram ten cores onto a single die, but the Xeon E7 design is a bit more than taking an eight-core "Nehalem-EX" Xeon 7500 processor and cramming two more cores onto the chip.
This complete system design marries lots of cores and execution threads to big gobs of shared L3 cache and the high-bandwidth QuickPath Interconnect (QPI) allowing for vendors to create machines that can, in theory, scale from 2 to 32 processor sockets in a single system image.
This kind of scalability is not available in systems based on current Xeon 5600 processors from Intel or those based on Opteron 6100 processors from AMD.
With the high core, thread, and socket counts, large memory capacity and bandwidth, high I/O bandwidth, Xeon E7 systems rival all but the very biggest RISC/Itanium or mainframe systems. And, Xeon E7 machines can do it running Linux or Windows, which have been tuned to scale across the cores, threads, and sockets.
Nehalem-EX chip architecture
With the "Nehalem-EX" Xeon 7500 processors announced in March 2010, Intel at last shifted its high-end chips away from the slow front side bus architecture.
For a long time this was a limitation in system scalability, requiring vendors to make their own chipsets and L4 caching mechanisms if they wanted to build machines with more than four sockets.
The Nehalem-EX processors also introduced a new chip architecture that put the processor cores on the outside of the chip and the shared L3 caches on the inside of the chip, linked by a super-fast ring interconnect.
This lets the cores share data more efficiently than prior designs, which had cores and their L3 caches cookie-cuttered every which way on the die. This L3 cache ring interconnect design with cores out is to be used in all new high-end Xeon and Itanium processors.
The Xeon E7 chips that came out this year was designed in Intel's Bangalore chip lab and is manufactured using the company's 32 nanometer processes.
The chip weighs in at a whopping 2.6 billion transistors- not much more than 2.3 billion transistors in the prior Xeon 7500 chips. Those extra 300 million transistors were used to put two more cores onto the chip and to boost the aggregate L3 cache size by 25 per cent to 30MB.
Strangely enough, the shrink from the 45 nanometer processes used with the Xeon 7500 chips from 2010 and the 32 nanometer processes for the Xeon E7s in 2011 was not used to add more cores to the design, although it looks like the Xeon E7 was actually a twelve-core design with the bottom two cores chopped off, as you can see:
So why isn’t the Xeon E7 a twelve-core chip, as you might expect?
First, Linux and Windows and the several virtual machine hypervisors can only scale so far right now, so adding a lot more cores does not necessarily get Intel anywhere.
Moreover, as much as customers like more cores – especially in virtualized environments, where they tend to pin one virtual machine on one core for the sake of simplicity – there are plenty who want higher clock speeds.
So Intel used some of the shrink from 45 nanometers to 32 nanometers to boost the cores by 25 per cent and to boost the clock speeds by between 6 and 13 per cent.
And, the die size was also reduced – from 684 square millimeters for the Xeon 7500 to 513 square millimeters – and that means Intel can cram more chips on a single 300-millimeter wafer. That cuts wafer costs.
At the same time, a smaller die size increases Intel's Xeon E7 yields because, in theory, using a mature process (as the 32 nanometer wafer baking process is thanks to its ramp last year on PC chips) combined with a smaller die lowers the probability that some booger will screw up all or some of a particular Xeon E7 chip.
When you add it all up, staying at ten cores instead of a dozen would make more money for Intel, particularly for workloads where HyperThreading can blunt some of the advantage that AMD has with its twelve-core Opteron 6100s, which do not support simultaneous multithreading.
As the code-name for the Xeon E7 processors suggests, the chip is based on the "Westmere" family of cores, a tweak of the Nehalem cores that adds a few features that are important for servers.
The first is the Trusted Execution Technology (TXT) feature, which Intel originally introduced on its vPro Core PC chips as a means of securing hypervisors running on PCs.
With TXT, the BIOS, firmware, and hypervisor of a machine are checked against a last-known-good configuration at boot time. If there is not a match, the BIOS, firmware, or hypervisor boot is halted until malware is removed from the system.
The Westmere class of chips also includes instructions for directly processing the AES algorithm that is commonly used to encrypt and decrypt data.
If you don't think this is a big deal, database giant Oracle did some tests with encrypted 11g R2 databases and has seen an order of magnitude performance increase for database encryption/decryption compared to doing it the hard way with the raw processor (as was the case with Nehalem and earlier families of Xeon server chips).
And, of course, the Xeon E7 processors support Intel's TurboBoost capability, which lets some cores run at a slightly higher clock speed in the event some of the cores are not doing any useful work and are put to sleep.
There are the obligatory increases in reliability that make the Xeon architecture more like the Itanium processor, which had a lot of the machine check architecture (MCA) features that only high-end RISC and mainframe systems used to have.
With the Xeon E7s, for example, there is a double device data collection (DDDC) RAS feature, which allows a system to correct from two memory errors without crashing. This feature was supposed to come out in 2008 with the original "Tukwila" Itaniums.
The modified Tukwilas launched as the Itanium 9300s in March 2010. Intel said it has added 25 new RAS features to the Xeon E7s, many of them from Itanium chips.
The Westmere-EX chips also get new-and-improved "Millbrook" memory buffer chips, which allow a four-socket machine using Intel's "Boxboro" 7500 chipset to scale up to 2TB of main memory and a two-socket box to support up to 1TB.
This is double the memory that plain-vanilla Xeon 7500 systems, announced last year, could support – if you discount the memory extension technologies that Cisco Systems, IBM, and Dell added into their own designs. A beefier variant of the Millbrook buffer chip allows for 1.35 volt DDR3 memory to be used instead normal 1.5 volt sticks.