Original URL: http://www.theregister.co.uk/2010/08/25/westmere_ex/
Intel details 10-core Westmere-EX server silicon
'Only ten? Wanna make something of it?'
Hot Chips Intel has confirmed that its upcoming Westmere-EX server processor will be a 10-core, server-enhanced, single-die beastie.
No surprise. Back in June, when the program for this week's Hot Chips conference was published, it listed a session in which Intel would discuss "A 20 Thread Server CPU". It was a no-brainer to deduce the Westmere-EX's core count from that bit of info, seeing as how Intel's Hyper-Threading Technology is a two-threads-per-core scheme.
And that's exactly what Intel announced on Tuesday at the conference, held on the campus of Stanford University in Palo Alto, California.
Also unsurprising was that Intel engineer Dheemanth Nagaraj, the lead microarchitect for the Westmere-EX and its presenter at Hot Chips, neither revealed any details about the processor's eventual clock speeds nor made any definitive predictions of its performance.
What was news, however, were some of the server-specific enhancements revealed Tuesday, including capabilities intended to improve performance and reliability, upgrade security, and aid virtualization.
"This is a product that is not shipping yet," Nagaraj explained to his audience — but didn't provide any information as to when that ship date might be. Remember, though, that the Nehalem-EX was launched just this March, and you can be sure that Intel plans to make a bit of coin off that investment before it obsoletes it with the Westmere-EX.
When it eventually ships, the EX will round out the Westmere effort — a process-shrinking "tick" that came after Nehalem's microarchitecture-rejuvenating "tock". In Intel's cutesy terminology, the "tick-tock" cadence is a process shrinkage followed by a microarchitecture upgrade followed by a process shrinkage followed by a microarchitecture upgrade followed by...
Intel rolled out the first members of the Westmere line — mobile and desktop chips — at the Consumer Electronics Show this January in Las Vegas after previewing them last December. The compute cores in all Westmere chips are substantially similar — "converged cores," as Nagaraj calls them. He did note, however, that the EX will have "server-specific features ... baked into the converged core."
The most obvious differences among Westmere processors — from the Core i3 up to the Westmere-EX — are their core counts and differences in their "uncore" elements such as QPI system interconnects, power and clock, integrated memory controllers, and L3 cache — although Intel would prefer that you call the L3 the "last level cache", or LLC.
The Westmere-EX's 10 cores share 10 "slices" of this cache, which are accessed over a bidirectional ring bus. The LLC uses what Nagaraj described as 10-way physical address hashing to avoid hot spots, and can handle five parallel cache requests per clock cycle.
While such a ring architecture can cause latency problems when a distant core needs a scrap of info from a distant LLC, Nagaraj didn't apper to be too worried. When asked if frequently used cached data could be migrated to a more advantageous location, he responded: "No, we do not support any type of heuristics for migration."
In addition to its 10 cores running 20 threads and the 10 slices of shared LLC, the chip has four QPI system interconnects, a scalable memory interconnect with support for up to eight DDR channels, and two on-chip memory controllers. It can be used in two, four, and eight-socket configurations gluelessly, or in larger configurations when using a node controller.
Speaking of sockets, the EX is socket-compatible with the Xeon 7500, née Nehalem-EX. As Nagaraj put it, "We believe that Westmere-EX is a compelling refresh to the Boxboro-EX platform," referring to the Xeon 7500's chipset home.
One nifty feature in the Westmere-EX that didn't make it into the Nehalem-EX is what Nagaraj identified as Directory Assisted Snoopy (DAS), a scheme to improve local memory (cache) latency in eight-socket glueless and some node controller–based platforms by removing the need to wait for all snoops to be resolved.
The reason that the Nehalem-EX doesn't have this latency-saving directory-based capability is simple: the Nehalem development team ran out of time: "The directory cache was, frankly, based on an observation that we noticed pretty late." Whether you want to call it a microarchitectural "tock" change or not, it made it into the Westmere-EX's process-shrinking "tick".
In another server-specific development, the EX — as was assumed it would — adds six new instructions from the Advanced Encryption Standard - New Instruction (AES-NI) ISA extensions for cryptographic acceleration. Nagaraj also claimed that the Westmere-EX has enhanced virtualization support, touting features that that will improve VM switch latency and add "VT-x3 real mode addressing" — and no, he wasn't referring to this VTX3.
Finally, Nagaraj was asked the inevitable question: why only 10 cores when, for example, AMD launched its "Magny-Cours" double-boxcars 12-core package in March and has a 16-core "Interlagos" package on the drawing board?
The Westmere-EX's head microarchitect refused to be drawn into any spat over a perceived "core gap," saying simply that ten cores "gave us the sweet spot for performance and time-to-market." It was simply a design choice, he said, "and where the product fits into the roadmap."
Nagaraj played by the traditional protocol of such presentations, and didn't mention AMD by name. But while Intel has the upper hand today on the server-market front, when AMD's "Bulldozer" new-architecture chips appear next year, things might get more interesting.
That said, by then Intel will have introduced its own new microarchitecture: the tock to Westmere's tick, "Sandy Bridge". ®