Xeon E7 servers run with the big dogs
Gives chase to RISC and Itanium foxes
Deep Dive Intel has come a long way in the server racket, and the new "Westmere-EX" Xeon E7 processor, launched in April and making its way into systems now, is arguably its most sophisticated processor for servers to date.
The Xeon E7 processors cram ten cores onto a single die, but the Xeon E7 design is a bit more than taking an eight-core "Nehalem-EX" Xeon 7500 processor and cramming two more cores onto the chip.
This complete system design marries lots of cores and execution threads to big gobs of shared L3 cache and the high-bandwidth QuickPath Interconnect (QPI) allowing for vendors to create machines that can, in theory, scale from 2 to 32 processor sockets in a single system image.
This kind of scalability is not available in systems based on current Xeon 5600 processors from Intel or those based on Opteron 6100 processors from AMD.
With the high core, thread, and socket counts, large memory capacity and bandwidth, high I/O bandwidth, Xeon E7 systems rival all but the very biggest RISC/Itanium or mainframe systems. And, Xeon E7 machines can do it running Linux or Windows, which have been tuned to scale across the cores, threads, and sockets.
Nehalem-EX chip architecture
With the "Nehalem-EX" Xeon 7500 processors announced in March 2010, Intel at last shifted its high-end chips away from the slow front side bus architecture.
For a long time this was a limitation in system scalability, requiring vendors to make their own chipsets and L4 caching mechanisms if they wanted to build machines with more than four sockets.
The Nehalem-EX processors also introduced a new chip architecture that put the processor cores on the outside of the chip and the shared L3 caches on the inside of the chip, linked by a super-fast ring interconnect.
This lets the cores share data more efficiently than prior designs, which had cores and their L3 caches cookie-cuttered every which way on the die. This L3 cache ring interconnect design with cores out is to be used in all new high-end Xeon and Itanium processors.
The Xeon E7 chips that came out this year was designed in Intel's Bangalore chip lab and is manufactured using the company's 32 nanometer processes.
The chip weighs in at a whopping 2.6 billion transistors- not much more than 2.3 billion transistors in the prior Xeon 7500 chips. Those extra 300 million transistors were used to put two more cores onto the chip and to boost the aggregate L3 cache size by 25 per cent to 30MB.
Strangely enough, the shrink from the 45 nanometer processes used with the Xeon 7500 chips from 2010 and the 32 nanometer processes for the Xeon E7s in 2011 was not used to add more cores to the design, although it looks like the Xeon E7 was actually a twelve-core design with the bottom two cores chopped off, as you can see:
So why isn’t the Xeon E7 a twelve-core chip, as you might expect?
First, Linux and Windows and the several virtual machine hypervisors can only scale so far right now, so adding a lot more cores does not necessarily get Intel anywhere.
Moreover, as much as customers like more cores – especially in virtualized environments, where they tend to pin one virtual machine on one core for the sake of simplicity – there are plenty who want higher clock speeds.
So Intel used some of the shrink from 45 nanometers to 32 nanometers to boost the cores by 25 per cent and to boost the clock speeds by between 6 and 13 per cent.
And, the die size was also reduced – from 684 square millimeters for the Xeon 7500 to 513 square millimeters – and that means Intel can cram more chips on a single 300-millimeter wafer. That cuts wafer costs.
At the same time, a smaller die size increases Intel's Xeon E7 yields because, in theory, using a mature process (as the 32 nanometer wafer baking process is thanks to its ramp last year on PC chips) combined with a smaller die lowers the probability that some booger will screw up all or some of a particular Xeon E7 chip.
When you add it all up, staying at ten cores instead of a dozen would make more money for Intel, particularly for workloads where HyperThreading can blunt some of the advantage that AMD has with its twelve-core Opteron 6100s, which do not support simultaneous multithreading.
As the code-name for the Xeon E7 processors suggests, the chip is based on the "Westmere" family of cores, a tweak of the Nehalem cores that adds a few features that are important for servers.
The first is the Trusted Execution Technology (TXT) feature, which Intel originally introduced on its vPro Core PC chips as a means of securing hypervisors running on PCs.
With TXT, the BIOS, firmware, and hypervisor of a machine are checked against a last-known-good configuration at boot time. If there is not a match, the BIOS, firmware, or hypervisor boot is halted until malware is removed from the system.
The Westmere class of chips also includes instructions for directly processing the AES algorithm that is commonly used to encrypt and decrypt data.
If you don't think this is a big deal, database giant Oracle did some tests with encrypted 11g R2 databases and has seen an order of magnitude performance increase for database encryption/decryption compared to doing it the hard way with the raw processor (as was the case with Nehalem and earlier families of Xeon server chips).
And, of course, the Xeon E7 processors support Intel's TurboBoost capability, which lets some cores run at a slightly higher clock speed in the event some of the cores are not doing any useful work and are put to sleep.
There are the obligatory increases in reliability that make the Xeon architecture more like the Itanium processor, which had a lot of the machine check architecture (MCA) features that only high-end RISC and mainframe systems used to have.
With the Xeon E7s, for example, there is a double device data collection (DDDC) RAS feature, which allows a system to correct from two memory errors without crashing. This feature was supposed to come out in 2008 with the original "Tukwila" Itaniums.
The modified Tukwilas launched as the Itanium 9300s in March 2010. Intel said it has added 25 new RAS features to the Xeon E7s, many of them from Itanium chips.
The Westmere-EX chips also get new-and-improved "Millbrook" memory buffer chips, which allow a four-socket machine using Intel's "Boxboro" 7500 chipset to scale up to 2TB of main memory and a two-socket box to support up to 1TB.
This is double the memory that plain-vanilla Xeon 7500 systems, announced last year, could support – if you discount the memory extension technologies that Cisco Systems, IBM, and Dell added into their own designs. A beefier variant of the Millbrook buffer chip allows for 1.35 volt DDR3 memory to be used instead normal 1.5 volt sticks.
Next page: Xeon E7 vs Itanium
Read carefully what was written...
>> allowing for vendors to create machines that can, in theory, scale from 2 to 32 processor sockets in a single system image.
What he specifically _didn't_ say is that you would be able to go beyond 8 sockets in a glue-less design...
If however you look at the glue'd designs on the market at the moment (most obvious one I guess is the DL980G7) its pretty clear to see how a single system could be scaled to 16 or 32 sockets (heck on the back of the DL980 you can even see the interconnects that might be used to do this!)
And if you consider the similarity from a chipset perspective between Westmere and Tukwila processors now, it's not that difficult to imagine a HP Superdome with x86 processors either...
The biggest challenge as you extend to x86 systems of this size isn't the resiliency of the hardware, it's how the OS interacts with the hardware and firmware during failure conditions - this is one area where Linux and Windows sadly lag behind the commercial UNIX OSs at the moment (I have no idea how well Solaris/x86 handles all this, but my bet is poorly given their general poor showing in this space even on SPARC)
There is TCA and then there is TCO.. There is a big difference on the 'buy and throw away' mentality of the x86 world and the buy, upgrade, upgrade upgrade... of the UNIX world.
I mean the cost of setting up a new server, if you operate in an ITIL environment as I do is wel.. almost more expensive than the actual server.
But if you are a small shop... then you mileage might vary..
IMHO TPM had a little to much of the Xeon Cool aid :)=
There is absolutely no doubt that the current Westmere-EX chip is one of the finest in the industry, at it is clearly at the top of the pack with perhaps only one to challange it.
But when that is said, then best of breed POWER7 beats best of breed Westmere-EX every time, hands down.
And where Westmere-EX servers are brand new, then POWER7 has been around for almost 1 1/2 year in POWER servers. Sure you have to compare what is shipping with what is shipping.
And TPM you have an error in your TPC-H POWER 780 bit, the POWER 780 used in the tpc-h benchmark is the 4 core/chip version, hence it's a 32 core submission. So if you compare it versus the 80 core x3950/3850 (which btw also has 4 times the memory), then the POWER machine has 2.4 times the core performance while delivering 95% of the chip performance. Now that is with 40% of the cores per chip.
So yeah, Westmere-EX is great.. but not the greatest.