Intel goes wide and deep with Xeon E5 assault
Blunting AMD's advantages
If you were planning on buying new servers in the coming weeks and months, Intel just gave you a whole lot of homework. And if you work at Advanced Micro Devices, you're getting some homework, too.
Intel already has a slew of E5-2600 processors aimed at workhorse two-socket machines and a bunch of E7s in different flavors aimed at machines with two, four, or eight sockets. There's E5-1600 processors, predominantly aimed at workstations, and also last year's Xeon E3-1200 processors and Xeon E3-1200 v2 chips for single-socket servers and workstations , launched today. But wait, that's not all you get. With the full revamping of the Xeon lineup today, Intel is adding with 17 more "Sandy Bridge" E5 processors for either two-socket or four-socket boxes.
Now server-makers and their customers will be given a bewildering number of ways to make a Xeon server that has specific CPU, memory, and I/O configurations. And you need to compare these against new Opteron 3200, 4200, and 6200 processors from Advanced Micro Devices if you want to really do your homework.
That said, more SKUs with different prices for different features is generally a good thing for server shoppers, even if you need to shop a little more carefully than you might have in the past.
With the "Sandy Bridge-EN" Xeon E5-2400 and "Sandy Bridge-EP" Xeon E5-4600 launched today, Intel is basically downshifting its existing two-socket and four-socket processors in terms of both features and price to better chase specific markets – and to keep AMD off-balance as it has tried to position its Opteron 4200 and 6200 chips as the cheaper and more core-heavy alternatives to Intel's Xeon E5-2600 and E7-4800 for machines with two or four sockets, respectively.
"We think the E5-2400 will be the preferred product for the HPC market," Dylan Larson, Xeon platform marketing director, tells El Reg. Larson says that the E5-4600, with its denser four socket format, will be "killer for HPC" as well when customers want fatter nodes in their clusters. Moreover, because of the lower pricing on the chips and chipsets compared to the much more expansive Xeon E7 family (in terms of QuickPath Interconnect, memory, and I/O bandwidth and memory and I/O capacity as well as core counts), Larson expects an expansion of the market for four-socket servers more than cannibalization of the E7s in the two-socket and four-socket arenas.
And while Intel doesn't say this explicitly, the much wider Xeon lineup is also being driven in part by OEM customers that are building storage arrays and networking gear based on Xeon chips rather than on PowerPC or proprietary parts. These customers have their own performance, feature, and pricing demands, and if Intel is to double the revenue stream for its Data Center and Connected Systems Group to $20bn by 2015 – as it plans to do – it is going to have to field products that not only compete against AMD in the server racket, but also compete against other circuits that have little to do with servers. Well, excepting that they feed them with data and connect them to the outside world, of course.
A snip and a twist
If you recall the Xeon E5-2600 launch from early March, these chips, which plug into the LGA1356 or Socket R socket, had two QPI links between the sockets, allowing for a massive amount of data interchange between the sockets so they could share a relatively large amount of I/O across those two sockets and also drive PCI-Express 3.0 controllers on the chips and many other I/O devices hanging off the "Patsburg" C600 series of chipsets. Simply put, the new Xeon E5-2400 is a similar "Romley" platform design with one of those QPI links between the processor sockets snipped off, while the Xeon E5-4600 design takes those dual QPI links coming off each processor to gluelessly connect the processors into a four-socket ring.
There is a performance penalty jumping from one processor socket to the one furthest away – two hops instead of one – but plenty of SMP servers have been architected this way and show decent enough performance. If you need better SMP scalability and more reliability, then the E7 is what you need. But remember that the E7 is a relatively expensive box also, due to its buffered memory cards, which help boost its performance and allow it to scale to eight sockets in a single system image.
And for the record: Intel has no intention of scaling up the E5-4600 to eight sockets and is "making a ton of investments" in future E7 designs. Intel has been mum about exactly what those future E7 plans might be, and Larson was not at liberty to discuss it further.
Both new families of Xeon chips announced today support features that debuted with the Xeon E5-2600s back in March, including the Advanced Vector Extensions (AVX) vector math unit (which can do two 128-bit or one 256-bit floating point operation per clock), Turbo Boost 2.0 clock frequency boosting (which is a lot more sophisticated than the originally), on-chip I/O processing (including PCI-Express 3.0 controllers), and Data Direct I/O, which allows Ethernet controllers and other I/O adapters to directly route traffic to processor L3 cache memory instead of making multiple ricochets in and out of main memory as prior generations of Xeon chips did.
Intel's Trusted Execution Technology (TXT) security feature for operating systems and hypervisors are on all of the new Xeon chips, as are the AES-NI instructions for doing AES encryption and decryption in silicon instead of in software. Most of the chips have Turbo Boost as well as HyperThreading, which is a layer of abstraction etched into the Xeon circuits that virtualizes a core to make it look like two virtual threads to operating systems and hypervisors. A few models in the rounded-out Xeon lineup based on Sandy Bridge cores do not have Turbo Boost or HyperThreading. Generally, when Intel deactivates a feature, it charges less money or gooses the performance of some other feature.
Aiming at entry and HPC server buyers
If the Xeon E5-2600 is a draft horse that is bred to pull a plow or heavy cart, then the Xeon E5-2400 was bred to pull a small coach or maybe even to be saddled up to ride. The Xeon E3s are more of a pony, then, in this analogy, and the Xeon E7s would be a Belgian draft horse, and a very large one at that, and the Xeon E5-4600s might pull a milk cart or beer wagon. (This is not a perfect analogy, obviously.) Depending on your workload – what you need to pull and how quickly you need to do it – you can team up the horses first through SMP clustering, where they are lashed to the same plow or wagon, and if you have more work to do, you get multiple teams to carve up the deliver routes or acreage and do the work in parallel (this is looser coupling at the software level instead of in the hardware). The important thing is that you have to feed to horse, and it is expensive. So you only get the right kind of horse for the job and the minimal number possible.
Block diagram of Intel's Xeon E5-2400 processor
The Xeon E5-2400 is aimed at entry server customers who don't need the memory and I/O expansion of the E5-2600 but who nonetheless need more than the Xeon E3 can offer. Here's one comparison that Intel has ginned up to show the relative performance differences:
The above comparison shows how a single-socket Xeon E3-1200 v2  stacks up against a Xeon E5-2400 with one socket and then two sockets populated. Generally speaking, the Xeon E3 v2 processors have faster clocks, but they only have four cores compared to four, six, or eight with the E5-2400s.
In the example above, Intel is pitting a server with one four-core E3-1280 v2 running at 3.6GHz (this is just shy of being the top bin part) against a machine with one or two top-bin E5-2470s running at 2.3GHz with eight cores. Thanks to the core count, on the SPEC integer, floating point, and Java benchmarks shown, even with the lower clock speeds, a single Xeon E5-2400 can do about 50 per cent more work than the Xeon E3-1200 v2, and adding the second E5-2400 to the box yields around triple the performance because the overhead on two-socket SMP is very low thanks to the peppiness of the QuickPath Interconnect.
Like other Sandy Bridge chips, both the Xeon E5-2400 and E5-4600 have 32KB of L1 instruction cache, 32KB of L1 data cache, and 256KB of L2 cache per core. On-chip L3 caches for these two chips weigh in at 20MB, but when Intel finds a chip with gunk on the portion of the cache that meets clock speed parameters, it goes from a top-bin part down to other parts in these lines, with L3 cache that can be as low as 10MB. The good news is that decommissioned L3 cache saves a bit of power. The bad news is that it adversely affects performance. There ain't no such thing as free oomph.
The Xeon E5-2400 has three DDR3 memory channels per socket with two memory slots per channel, for a total of a dozen sticks for a two-socket box. Memory speeds vary by model, with top-bin parts offering 1.6GHz memory running at 1.5 volts if you need speed (as many HPC shops do). If thermals are an issue, 1.35 volt memory can spin up to 1.33GHz in these systems, and if budget is a concern, 1.07GHz and 800MHz DDR3 memory sticks are also supporting. On the top three SKUs in the E5-2400 range support 1.6GHz memory and the two lowest bin parts only support the two lowest memory speeds. Memory capacity tops out at 384GB, half of what the E5-2600 offers because it has more channels and slots per channel.
Intel's new Xeon E5-2400 processors
The QPI link between the two Xeon E5-2400 sockets also changes along with the memory speed, with the top three bins having a QPI link running at 8GT/sec, the two low bins running at 6.4GT/sec, and the remaining parts in the middle running at 7.2GT/sec. The two low-bin parts do not have Turbo Boost or HyperThreading and only have four cores and burn a little hotter than you might expect. But they have one very important virtue: a really low price. L3 cache sizes decrease as you move from top to bottom bins, as is always the case with Intel Xeon chips.
The Xeon E5-2400 chips have 24 PCI-Express 3.0 lanes per socket, which is lower than the 40 lanes on the E5-2600 but higher than the 16 lanes on the E3-1200 v2 also announced today.
Just so you don't have to go hunting for it, here is a table for the Xeon E5-2600 chips announced back in March so you can see how they differ from the E5-2400s.
For quick reference: Intel's Xeon E5-2600 processors
It will be interesting to see how full systems price out based on the E5-2400 and E5-2600 processors. El Reg will be sorting that out once the server-makers get their machines into the field in the coming weeks. We'll also be sorting out how the new Xeons stack up against the new Opterons inside actual boxes. Suffice it to say that the E5-2400 is aimed at the dual-socket Opteron 4200 server and the E5-2600 is aimed at the two-socket Opteron 6200 server.
Four socket to me
That means, of course, that the Xeon E5-4600 is tag-teaming with the E7-4800 on four-socket Opteron 6200 boxes, with the E7 shooting high and the E5 shooting low and taking on AMD with different price points and feature sets.
The Xeon E5-4600 chip has many of the same feeds and speeds of the E5-2600, as you would expect since it is basically the same chip but with four QPI links on four processor playing ring around the rosy instead of two QPI links for two processors linking arms and doing windmills. (You remember doing both. You're not that old.) The Xeon E5-4600 has four memory channels per socket and up to three memory sticks per channel for up to 48 memory slots and a maximum of 1.5TB of memory shared across the four sockets.
Intel's Xeon E5-4600 processors
The processors, which are made using Intel's 32 nanometer processes like other Sandy Bridge chips, come with variants with four, six, or eight cores, and some of the processors do not support Turbo Boost or HyperThreading. The three top-bin parts can use 1.5 volt, 1.6GHz main memory, as can the oddball (and HPC-aimed) E5-4617, which runs at the highest clock speed in the E5-4600 lineup at 2.9GHz and which does not support HyperThreading. The two low-bin parts support the cheaper and slower memory, and in the middle you can choose regular or 1.35 low volt memory running at 1.3GHz, 1.07GHz, or 800MHz.
The Xeon E5-4600 is aimed at four-socket blade servers and density-optimized rack servers that are sometimes used by supercomputer centers, sometimes by enterprises, and increasingly by businesses of all stripes in China . (Call it the China Syndrome, but for whatever reason, four-socket servers are more popular than two-socket machines in China.)
SMP scalability is the whole point of buying a four-socket node, and Intel is showing some pretty good numbers:
Intel trotted out the SPECint_rate2006 and SPECfp_rate2006 integer and floating point benchmarks as well as an internal server virtualization benchmark test to compare a two-socket server using the six-core Xeon E5-2630 – which has a 2.3GHz clock speed and has 15MB of L3 cache – to a four-socket machine using the six-core Xeon E5-4610, which clocks at 2.4GHz and which also has 15MB of cache. Both machines were configured using the C606 chipset – with the duo having 64GB of main memory and the quad having 128GB. As you can see, for these workloads at least, converting that two-socket machine into a four-socket box basically doubles the performance with little overhead for the SMP clustering.
And that is why Intel expects a number of customers – especially those doing server virtualization en masse – to give the Xeon E5-4600s some play in the data center, displacing at least some two-socket machines. Here's why:
By Intel's math, even if you pay slightly more for server hardware for these four-socket boxes (Intel reckons around $12,700 for a four-socket machine compared to just over $7,000 for a two-socket machine), it takes half the number of physical boxes to support the same workload, so you spend $71,800 less buying the bigger boxes at the numbers shown. And when you add in lower operating system licensing, power, cooling, and real estate costs over a four-year term, the quads save you around $264,000 over four years. (This comparison put Windows Server 2008 R2 Enterprise Edition on both sets of boxes, and that may not be the most accurate scenario, but it does drive most of those savings above - about $200,000, in fact.) In any event, Intel says the quad can save you 24 per cent off the total cost of ownership compared to the duo. Do your own comparisons when the servers are out.
It's a pity, then, that the Sandy Bridge-EP design didn't have three QPI links between two sockets. If so you could probably do a glueless eight-socket box and be a max of three hops between any two CPUs in an SMP cluster. Maybe that's what Intel has in store for some future Xeon. ®