Original URL: http://www.theregister.co.uk/2013/09/10/intel_ivy_bridge_xeon_e5_2600_v2_launch/

Intel carves up Xeon E5-2600 v2 chips for two-socket boxes

Ivy Bridge-EP CPUs slide into Sandy Bridge sockets, chase loads of workloads

By Timothy Prickett Morgan

Posted in Servers, 10th September 2013 19:03 GMT

IDF13 Companies with workloads that like to ride on lots of threads and cores are going to be able to get a lot more bang for a two-socket box thanks to the launch of the "Ivy Bridge-EP" Xeon E5-2600 v2 processors by Intel.

Those with pesky applications that like faster clocks to get more work done, well, the process shrink is giving the Ivy Bridge Xeons more cache and microarchitecture tweaks as well as modestly faster cores. But as we all know, clock scaling is a lot more difficult than core scaling and that is ultimately making this a software problem for companies to solve.

The transition to 22 nanometer TriGate manufacturing processes for the workhorse two-socket server platform from Chipzilla will start a whole new refresh cycle out there in the data centers and data closets of the world. Or, that is the hope at least.

The most eager buyers will be those shops that have much older Xeon 5500 and 5600 systems out there, which have largely burned up their economic life and just do not offer the compute density and memory and peripheral expansion of a current "Romley" server platform sporting an Ivy Bridge-EP processor.

Server shipments and revenues have been on the wane in recent quarters. But x86 systems have fared better than other platforms in fighting the lowering tide caused by the shift to cloud computing (cloud operators tend to buy vanity-free and cheaper machines than service providers of years gone by did), the increasing use of server virtualization, the still awesome pace of Moore's Law (which allows an ever-increasing more computing capacity per chip), and the skittishness in certain parts of the global economy.

Now, with the Xeon E5-2600 v2 processors shipping, we get to find out in the coming quarters if there is pent up demand for x86 server capacity. If anything, the fact that x86 server shipments were more or less flat in the second quarter, by IDC's reckoning, would seem to indicate that demand is holding up pretty well. Some companies just can't wait to buy servers, even if new and presumably better stuff is coming soon.

Die shot of the ten-core Ivy Bridge-EP processor

Die shot of the ten-core Ivy Bridge-EP processor

The top-end twelve-core Xeon E5-2600 v2 chip has around 4.3 billion transistors and has an area of 541 millimeters square. The Ivy Bridge-EP processors are going to pack a pretty big punch compared to the Sandy Bridge-EP processors they replace in the Intel lineup.

Based on early benchmark test results from server makers that will be divulged in the coming days, Intel executives tell El Reg that customers can expect for the new Xeon E5-2600 v2 socket to deliver up to 50 per cent more performance and up to 45 per cent more performance per watt than the Xeon E5-2600 v1 line that was announced in March 2012. Those chips were also known as "Jaketown" by the server techies inside of Intel, and they call the new chip "Ivytown" sometimes just to be consistent. Somewhat.

Those performance figures are based on SPECVirt_sc2013 tests and the bang for the watt numbers come from SPECpower_ssj2008 tests. The performance that customers will see with the Xeon E5-2600 v2 processors will vary, of course.

The basic features of the Xeon E5 v1 and v2 processors

The basic features of the Xeon E5 v1 and v2 processors

The performance, enabled by the shrink from the 32 nanometer processes used in the Xeon E5-2600 v1 processors, is enabled by a balance of more cores and more L3 cache memory on the chips, as you can see in the table above comparing the two chip families. The top-bin Ivy Bridge-EP parts have 50 per cent more cores, at a dozen per die, and 50 per cent more L3 cache, at 30MB, compared to the Sandy Bridge-EP chips.

The top base frequency and Turbo Boost maximum frequencies on the new Xeon E5-2600 chips only go up by 200MHz, which is only a 6.1 per cent jump in clock speed. That increased clock speed is basically added to the chip to make up for the extra latencies in taking the processor design up to a dozen cores from eight cores.

The increased performance is also enabled by some other tweaks. Main memory now runs at 1.6GHz for 1.35 volt memory (up from 1.33GHz) and at 1.87GHz for 1.5 volt sticks (up from 1.6GHz). The PCI-Express 3.0 controllers run at the same speed (8GT/sec) and there are the same 40 lanes of bandwidth coming into the on-die controllers as with the Sandy Bridge-EP parts. Main memory is also doubled up to a maximum of 1.5TB (through the use of 64GB sticks in the 24 slots in a two-socket system).

Different SKUs in the Ivy Bridge-EP line support different QuickPath Interconnect point-to-point interconnect speeds, as did their predecessors, and the QPI port count stands at two as it did with the Sandy Bridge-EP chips. Both families of chips support HyperThreading, Intel's implementation of simultaneous multithreading.

SMT virtualizes the instruction pipeline in the processor so, in this case, it can juggle two instruction streams at the same time and therefore get a slightly higher amount of work done than it might have been possible to otherwise do. (Provided your workloads are HT-friendly, of course.) Not every memory speed is supported on every chip - again just like its predecessor, the Sandy Bridge-EP.

Here's another new and interesting thing. There is not one Ivy Bridge-EP processor, but rather there are three different variants of the chip, each one tuned for specific workloads and each sporting different numbers of cores, memory controllers, cache sizes, frequencies, and thermal envelopes.

Block diagrams of the three Ivy Bridge Xeon E5 processors

Block diagrams of the three Ivy Bridge Xeon E5 processors

The first variant has four or six cores active and the PCI-Express and QPI links as well as a single memory controller with four channels. It is, explains Ian Steiner, a processor architect at Intel's Beaverton, Oregon facility, aimed at both low-power uses as well as at workloads than need higher frequencies.

This one has 15MB of L3 cache and has a thermal envelope of between 40 to 80 watts. The cores, cache segments, QPI links, and PCI controllers are hooked to each other by double rings, just as was the case with the Sandy Bridge-EPs.

The second variant, which addresses the belly of the two-socket server market, offers six, eight, or 10 cores and has 25MB of L3 cache on the die. The same double rings link the core components together. In this case, the thermals range from 70 to 130 watts and again, there is a mix of low-power and higher frequency variants to target different kinds of workloads.

The third type of Ivy Bridge-EP processor is the full-on twelve-core beast, which comes in 115 watt and 130 watt options. Intel has killed off the 135 watt SKU for servers, but there is a 150 watt part for workstations, as in the past.

This chip has three rings linking the cores and cache segments to other components on the die, and as you can see, the memory controller is also broken into two but has half as many channels hanging off each controller to yield the same four channels per socket as the other Xeon E5-2600 v2 variants.

In the past, these might have been two or even three different processors, possibly with different sockets. But they are one processor family all sharing the same socket, and one that is identical to the earlier Xeon E5-2600 v1 processors from March 2012.

The Romley server platform from designed to take Ivy Bridge-EP parts

The Romley server platform from designed to take Ivy Bridge-EP parts

"The general goal is to do everything well," explains Steiner, and that cannot be accomplished with a single variant of the Ivy Bridge-EP processor. "We are interested in having some high frequency, low core parts." And the middle variant from six to ten cores was designed explicitly so it would have 25MB cache against six cores – again, precisely to match the needs of particular (and unnamed) customers.

"This is sort of right in the middle. You get good power efficiency, you get peak performance and you can push it all the way up to 130 watts if you want. The twelve-core is mostly targeted at peak performance, there are just 115 watt and 130 watt offerings, and there are no low power options. But I don't want to pretend that it is not a power-efficient SKU. It can actually be very power-efficient in a full rack deployment," he said.

The Ivy Bridge-EP core is identical to that used in the desktop Ivy Bridge Core parts from last year, and it sports a number of microarchitecture improvements. Steiner says that Intel is not just focused on single-thread performance improvement with each generation, but also boosting the instructions per clock and the power efficiency of the core.

"Long story short, we have added a bunch of stuff to make performance work better," says Steiner. The new Ivy Bridge core has a floating point 16-bit to single-bit precision converter, which won't be a "huge performance thing but it is nice for certain workloads," according to Steiner.

Programmers using earlier generations of Xeon have had to write routines to do copy/fill operations, and now there are a set of instructions with the unwieldy name of REP MOVSB/STOSB that means coders don't have to monkey around in assembler and they can just invoke these instructions to do copy/fill. The core also now has fast access to sets of registers by user threads, which is an optimization aimed precisely at server workloads running on machines with higher thread counts.

The Ivy Bridge core also includes the "Bull Mountain" random number generator, known as SecureKey. Other server-class chips already had random number generators, and now Intel has caught up.

The "Avoton" Atom C2000 chip also has the random number generator, and it also sports the OS Guard supervisor mode execution protection circuits that are embedded in the Ivy Bridge core.

You have to change the operating system code to make use of OS Guard, which protects against hacks that hijack kernel execution by preventing execution of user mode pages while in supervisor mode, such as the method used by Stuxnet.

It is supported in the Linux kernel already, and presumably support for OS Guard will come to Windows at some point. (It was not yet ready when Intel gave its briefings on the new Ivy Bridge chips for servers.)

It is not just about the core

The "uncore" portions of the new Xeon E5 chip has an improved snoop directory, which moves from one-bit to two-bit (but don’t call it a two-bit snoop directory, mind you) that is enabled in all variants of the Ivy Bridge-EP. This prior snoop directory was only available on the four-socket variants of the Sandy-Bridge-EP chips and was disabled in the two-socket variants.

Now it is enabled on the two-socket Xeon E5-2600 v2 and will presumably be available on the Xeon E5-4600 v2 when Intel gets around to launching them for four-socket machines. (No word on that from Chipzilla at this point).

The L3 cache controller has had the bits doubled up to two as well in its least recently used (LRU) unit, which Steiner says improves cache hit rates. The twelve-core variant has three rings and two memory controllers instead of two rings and one memory controller, which allows for the cache bandwidth to scale linearly with the core counts.

The PCI-Express 3.0 controller on the die supports x16 non-transparent bridge (NTB), up from x8 on the Sandy Bridge-EP chips, and this is a feature that is largely targeted to the high-end of the storage market, according to Steiner.

This allows CPU-to-CPU linking across server nodes over the PCI-Express bus, and is used to run RAID 5 and 6 data protection code over multiple processors embedded in mult-node disk controllers. The PCI-Express controller also has deeper queues and improved arbitration to boost bandwidth as well as some latency reductions that will be handy for HPC customers.

There are eighteen members of the E5-2600 v2 family, and here they are without further ado:

The new Xeon E5 v2 processors

The new Xeon E5 v2 processors

Intel is grouping the chips based on their capabilities, and just to be confusing, these groupings have nothing to do with the three different variants of the Ivy Bridge dies with six, ten, or twelve cores on the chip.

The Advanced chips have DDR3 memory running at 1.87GHz and their QPI links run at 8GT/sec, with the exception of the E5-2650L v2, which has memory running at only 1.6GHz. The four Standard variants of the Ivy Bridge-EP chips have memory running at 1.6GHz and QPI links that run at 7.2GT/sec, while the two Basic variants step down the memory to 1.33GHz and the QPI links to 6.4GT/sec.

There is a Xeon E5-2687W targeted solely at two-socket workstations, which has eight cores and which runs at 3.4GHz. It burns at 150 watts and has memory and QPI links cranked up to their full speeds.

If you want to make comparisons, which El Reg loves to do, here are the feeds and speeds of the prior Sandy Bridge-EP chips:

The Sandy Bridge-EP Xeon E5 v1 processors

The Sandy Bridge-EP Xeon E5 v1 processors

On a very rough basis, the price/performance of the two Xeon E5-2600 families is the same, as you can see from the Cost Per Ooomph stats in the two tables above. Oomph is the number of cores times the clock speed, which is a very rough indicator of raw performance within a processor architecture. Yes it is imperfect.

Pricing is the traditional per-unit cost of the chips when bought in 1,000-unit trays from Chipzilla. But there are some interesting comparisons to make that show you can get either more bang for about the same buck or the same bang for a bit less buck.

First, look at the eight-core E5-2667 v2 running at 3.3GHz and sporting a 25MB L3 cache; this chip costs $2,057. The most similar E5-2600 v1 part is the top-bin E5-2690, which was the top-bin part and which cost the same $2,057, but the Ivy Bridge variant delivers about 14 percent more performance and 5MB more of L3 cache to boot.

The ten-core chips run at slower clock speeds, and offer better bang for the buck using the raw Oomphs. But some apps won't scale across 20 cores and 40 threads, so this extra computing capacity might be lost on them.

As you might expect, the two top-bin twelve-core Ivy Bridge chips in the lineup are the most expensive ones in the line and also have the worst bang for the buck, yielding profits for Intel from those customers who need more cores. It will be interesting to see if server makers try to charge a premium for the other Ivy Bridge-EP chips and therefore boost their own profits or just slide them into their existing machines at roughly the same prices.

The temptation is no doubt there to try to get some profits on the front-end of the Ivy Bridge wave among server makers, but with the cut-throat pricing in the server market these days, that probably won't happen and Intel will be the one getting whatever profits there are from the new CPUs. ®