Original URL: http://www.theregister.co.uk/2012/03/06/intel_xeon_2600_server_chip_launch/

Intel plugs both your sockets with 'Jaketown' Xeon E5-2600s

Oof! Chipzilla unzips double bulge and gets its stuff out

By Timothy Prickett Morgan

Posted in The Channel, 6th March 2012 17:00 GMT

The Xeon E5-2600, formerly known by the code-name "Jaketown" inside of Intel and "Sandy Bridge-EP" when Intel referred to it externally, is finally here for mainstream, two-socket servers.

And now the 2012 server cycle begins in earnest – even if it wasn't soon enough for the most ardent data-center motorheads.

The Xeon E5-2600 is one of a family of server chips based on the Sandy Bridge architecture that Intel has either put out in 2011 or will put out in 2012. The chips are based on a similar core design employed in the current 2nd Generation Core processors for laptops and desktops, and the Xeon E3-1200 processors for single-socket servers and workstations, which came out early last year.

Among the many new features in the Sandy Bridge architecture that are shared across all processors are support for Advanced Vector Extension floating point math and a revamped Turbo Boost 2.0 overclocking mechanism that is more efficient and flexible than the prior implementation.

Intel Xeon E5-2600 package

The Xeon E5-2600 package

All Sandy Bridge processors are manufactured in Intel's 32-nanometer wafer-baking processes, not the 22nm "Tri-Gate" process that Intel will fire up later this year for the "Ivy Bridge" family of PC processors.

Intel generally does not use a new process first on its volume server chips, so eventually there will be Ivy Bridge variants of Xeons that get the 22nm shrink. But that is not today.

Intel was widely expected to put the Xeon E5-2600 processors into the field in full volume last fall, but instead it put the chips out in limited volume (and under non-disclosure) to selected HPC and hyper scale data-center customers who could not wait until today's launch to get to Jaketown.

The Xeon E5-2600 is not just a Xeon E3-1200 with some two-way SMP glue on it make it work pretty with the "Patsburg" C600 chipset that's part of the server platform that Intel calls "Romley." There's a lot more to it than that.

Intel briefed El Reg about all the technical goodies in the Xeon E5-2600 processors ahead of the launch, and walked us through the details of the chip, which weighs in at 2,263,000,000 transistors on a silicon wafer that has 416 square millimeters of area. The Xeon- E5-2600 packs a lot of functionality into those transistors.

Intel Xeon E5-2600 core

The Xeon E5-2600 core (click to enlarge)

Let's start with the cores.

Each core on the Xeon E5-2600 chip has a completely revamped branch predictor hanging off its 32KB instruction cache, and the entire "front end" of the chip – the L1 instruction cache, predecode unit, instruction queue, decoder unit, and out-of-order execution unit – has been designed to sustain a higher level of micro-ops bandwidth and use less power by turning off elements of the front end when it can use micro-ops caches added to the chip.

The core has 32KB of L1 data cache and 256KB of L2 (or mid-level, as Intel sometimes calls it) cache memory. It has two load units, which can do two 128-bit loads per cycle, and a store unit.

Intel Xeon E5-2600 die

Die shot of the Xeon E5-2600

The AVX unit on the cores can do two floating point operations per cycle – twice what the current Xeon 5600s can do – and it moved from 128-bit to 256-bit processing, as well. This is a huge jump, and one that matches what AMD can do with a "Bulldozer" core with half the cores turned off and the scheduler running 256-bit floating-point instructions through half the cores.

The Xeon E5-2600 is designed to have as many as eight cores, and like last year's "Westmere-EX" Xeon E7 processor for high-end four-socket and eight-socket servers and the impending "Poulson" Itanium processor expected this year, the Jaketown chip features a "cores out" design and a ring interconnecting the shared L3 caches and those cores so they can share data.

Each core has a 2.5MB segment of L3 cache loosely associated with its core, but these are glued together by the ring into a 20MB shared cache, which Intel often calls the last-level cache.

Obviously, Intel searches through its chip bins and finds parts in which all of the components on the chip are working, and makes other parts with fewer cores and smaller caches as a means of improving its yields and fleshing out its Xeon E5-2600 line with different performance and price points.

As with the past Xeon and Itanium chips, the Xeon E5-2600 has QuickPath Interconnect (QPI) links coming off the chip to do point-to-point communications between the processors. The QPI link agent, cores, L3 cache segments, DDR3 memory controller, and an "I/O utility box" all have stops on this ring bus. The utility box includes a Direct Media Interface, PCI-Express, and VT-d I/O virtualization as a unit, sitting on this ring bus at the same stop.

Intel Xeon E5-2600 diagram

Block diagram of the Xeon E5-2600 chip (click to enlarge)

Jeff Gilbert, the chief architect of the Xeon E5-2600 processor, told El Reg that this bidirectional, full-ring interconnect has more than a terabyte per second of bandwidth coming off and going onto the ring. Those two QPI 1.1 agents that cross-couple the two Xeon E5-2600 processors share an aggregate of 70GB/sec of bandwidth across those two links. The QPI links run at 6.4GT/sec, 7.2GT/sec, or 8GT/sec, depending on the model of the chip.

One big change with the Xeon E5-2600 chips is the integration of PCI-Express 3.0 controllers into the I/O subsystem right there on the die. The PCI Express 3.0 controller on the chip implements 40 lanes of I/O traffic, which is sliced and diced in various ways in conjunction with the Patsburg C600 chipset.

Each E5-2600 socket has four memory channels, up from three with the Xeon 5600s, and you can hang three DIMMs per channel for a total of a dozen per socket. Intel is supporting unregistered and registered DDR3 memory sticks as well as the new load-reduced, or LRDIMM, memory for those who want to get the maximum memory capacity per socket, which stands at 384GB. Regular 1.5 volt as well as "low voltage" 1.35 volt memory are supported on the processors, and memory can run at speeds of 800MHz, 1.07GHz, 1.33GHz, or 1.6GHz.

The important thing about the Xeon-E5 and the Romley platform design, explains Ian Steiner, a processor architect at Intel's Beaverton, Oregon facility, is that the platform gets back to the core-to-memory bandwidth ratio from the "Nehalem" Xeon 5500 launched three years ago.

The design gets the cache "out of the way", and thanks to that high-bandwidth ring interconnect on the chip and a bunch of microarchitecture tweaks relating to the memory controller scheduler, and that extra memory channel, if you scale from one to two sockets on a Xeon E5 box, you can now get around double the memory bandwidth.

About 33 per cent of that improvement comes from the move from three to four memory channels, another 20 per cent comes from moving from 1.33GHz to 1.6GHz memory, and the remaining 40 per cent comes from the ring and microarchitecture changes.

Based on internal benchmark tests done by Intel, a two-socket Xeon 5600 box with two processors basically started choking at 40GB/sec with 1.33GHz memory. But you can push a two-socket Xeon E5-2600 box to more than 60GB/sec using 1.07GHz memory and as high as 90GB/sec using 1.6GHz memory.

Chip off the new block

Clock speeds on the initial batch of Xeon E5-2600 processors run from a low of 1.8GHz to a high of 3.3GHz, core counts range from as few as two and as high as eight. L3 cache runs from 5MB to 20MB, depending on the model. Here's how the Xeon E5-2600s stack up:

The Xeon E5-2600 processors

Feeds and speeds of the Xeon E5-2600s

All of the processors listed above support AES-NI encryption/decryption on the chip, as well as Virtualization Technology (VT) circuit-based assistance for hypervisors and Trusted Execution Technology (TXT) security.

The last two parts, the E5-2609 and E5-2603, do not support Turbo Boost (TB) overclocking or HyperThreading (HT), Intel's implementation of simultaneous multithreading. But all of the remaining models in the Xeon E5-2600 line do support HyperThreading, which carves each core into two virtual threads, and Turbo Boost, which adds anywhere from 200MHz to 900MHz to the clock speed of cores on the chip, depending on the model and the number of cores you have turned on when you hit the nitro. On eight-core chips, if you have all eight cores running, you get something between 200MHz and 500MHz, and shutting all the cores down except one will get you between a 500MHz and a 900MHz speed bump for that one core.

A platform, not just a chip

The prior generation of Xeon 5600 platforms – by which is meant the combination of processors and chipsets – was called "Tylersburg", perhaps after a dinky town in the middle of the Allegheny National Forest, and maybe the new "Romley" platform is named after a Colorado ghost town once known for mining.

Wherever it gets its name, Romley is the new two-socket server platform, and this is what it looks like:

Intel Romley platform diagram

Block diagram of the Romley server platform (click to enlarge)

Technically, this is the Romley-EP platform, and El Reg expects Intel to eventually offer a low-cost, two-socket server based on as-yet-unannounced "Sandy Bridge-EN" Xeon E5-2400 processors, as well as a four-socket Romley platform based on a forthcoming Xeon E5 variant. But Intel's top brass in the Data Center and Connected Systems Group are not talking about these other platforms on Tuesday, and would not confirm any of these details.

The Romley-EP platform puts two sockets on a system board and the Patsburg C600 chipset. Patsburg is basically a "Cougar Point" C200 chipset on steroids, and is technically a Platform Controller Hub (PCH) that has been optimized for server workloads. (The Intel 6 and C200 chipsets are used for PCs, workstations, and entry servers.) This PCH is basically the system clock for the motherboard, plus whatever southbridge functions that have not been absorbed onto the chip itself.

Intel Romley platform Patsburg chipset diagram

Intel's Patsburg chipset diagram (click to enlarge)

The Romley platform brings it all together, with two QPI links between the processors that allow close-to-ideal symmetric multiprocessing scaling on most workloads. The PCI Express 3.0 controllers on each Xeon E5-2600 socket have 40 lanes of bandwidth per socket, which can give you five x8 slots per socket. Then there's I/O bandwidth left over to give you a PCI-Express 2.0 x4 slot on one socket and a DMI2 slot on the other one.

The C600 chipset is a modified version of the C200 chipset with more storage and I/O options suited for enterprise systems. The chipset links to the Xeon E5-2600 processors through the DMI2 slot and can also make use of that PCI-Express 2.0 x4 slot as well.

You can hang a whole bunch of things off of this PCH, depending on what you activate in the chipset: 14 USB 2.0 serial ports, four or eight SAS ports running at 3Gb/sec, eight PCI-Express 2.0 slots, a plain-old PCI slot, four SATA ports running at 3Gb/sec or two SATA ports running at 6Gb/sec. There are also ports for various kinds of other storage and peripheral devices.

El Reg will be digging into the details on the new Xeon E5-2600 processors in terms of performance and pricing, as well as looking at how server makers deploy these chips and chipsets in their machines. Stay tuned. ®