Original URL: http://www.theregister.co.uk/2011/02/20/intel_poulson_itanium_isscc/

Intel sends 'Poulson' Itaniums to the shrink

'Designed with the future in mind'

By Timothy Prickett Morgan

Posted in Servers, 20th February 2011 21:38 GMT

ISSCC Everyone else might have pretty much abandoned the Itanium processor, but Intel and Hewlett-Packard – who co-designed the 64-bit processor – remain firmly committed. That's mainly because HP has a captive audience of HP-UX, NonStop, and OpenVMS customers that spend billions of dollars a year on systems and therefore make it worth Intel's financial while.

The future "Poulson" Itanium processor will close out the Monday sessions devoted to enterprise processors at the IEEE's International Solid-State Circuits Conference in San Francisco, but to keep the Poulson Itaniums from getting lost in the CPU news shuffle, Intel gave press and analysts a sneak peek at some of the details it will divulge at ISSCC.

Poulson is, as many expected, an eight-core processor, but it is not just a shrink of the current quad-core "Tukwila" Itanium 9300 processor that launched at last year's ISSCC event. The Poulson chip not only skips a few tick-tocks in the normal Intel pattern of alternating architectural changes with chip manufacturing process changes, but is doing a tick and a tock at the same time. Or, more precisely two tocks and a tick at the same time, since the jump from the Tukwila chips involves moving from 65 nanometer wafer baking techniques to Intel's current (and high volume) 32 nanometer processes, skipping 45 nanometers entirely.

Those are the tocks. The tick is the new Itanium microarchitecture and a completely redesigned core that goes along with it.

The Poulson chip is the ninth in the Itanium family, which is more than a decade old and which was intended to replace x86 processors in the glorious future painted by Intel, HP, and the other Itanium partners back in the mid-to- late 1990s. That didn't happen, obviously, and everyone else that had adopted Itanium beside HP has backed Intel's high-end Xeon 7500s for everything but the proprietary platforms they don't want to move to another chip architecture. Again.

While Itanium did not change the world directly, it sure did indirectly, giving Advanced Micro Devices a chance to get its 64-bit Opterons into the field because of Itanium incompatibility with Xeons and the many delays in getting Itanium chips into systems. The initial success of the Opterons made Intel refocus on the performance, power consumption, memory addressing, and RAS features in the Xeon line, and in the end, the market ended up with a very respectable Xeon, Opteron, and Itanium lineup, providing different systems suited to different needs and giving HP, Bull, and NEC very profitable proprietary machines.

If Intel could sell a chocolate chip cookie as a processor to HP, and it worked and it cost $3,700, it would do it. Intel doesn't care. And at the prices HP charges for Integrity and Superdome 2 systems, the server maker is perfectly happy to stick with Itanium - just like Oracle/Fujitsu can command a premium for Sparc systems and IBM can for mainframe and Power systems. Itanium didn't take over the world, but like these other RISC and mainframe platforms, it sure has taken over its niches.

In a briefing on the chip, Rory McInerney, vice president of Intel's Architecture Group and director of microprocessor development, said the Poulson chip involved a "substantial redesign" and that Intel "designs this with the future in mind." What McInerney meant by that later statement is that the microarchitecture design changes and core layout will allow Intel to scale up to the future "Kittson" Itaniums and beyond over the next couple of years with whatever chip making processes it has in volume at the time.

Here's what the Poulson Itanium die looks like:

Intel Poulson Itanium Die

Intel's future "Poulson" Itanium server processor

The Poulson chip has eight cores, two directory caches, five QuickPath Interconnect (QPI) links, two memory controllers, two shared L2 caches, and a bunch of system logic all on the same piece of silicon. It weighs in at 3.1 billion transistors, and is 588 square millimeters in size. The current Tukwila Itanium chip, by comparison, has four cores, a total of 2 billion transistors, and is 700 square millimeters in area. The double shrink from 65 to 32 nanometers allows for a lot more stuff to be crammed onto the chip, and also a reduction of the size of the chip by about 20 per cent and a slight reduction in the thermal design point, which drops from 185 watts with top-end Tukwila parts to 170 watts with the fastest Poulson parts.

McInerney said that Intel is not divulging clock speeds on the Poulson chips at this time, but presumably the shrink will also allow Intel to boost the clock speed on the chips a little bit. Probably less than a 20 per cent boost, since the relationship between clock speed and heat is logarithmic, not linear. Or, Intel might be using the extra transistors to implement a better variant of HyperThreading, with perhaps four threads per core as IBM has done with its Power7 chips. But the core design, discussed below, indicates that Intel is less concerned about clock speed and more concerned about how much work gets done per clock and how little energy it can take.

Would you like to share my socket?

The Poulson chip has a combined 54 MB of on-die memory, including L1 and L2 caches, tags and registers, and directory caches. 50 MB of this is in static RAM caches. There is 256 KB of "mid-level" data cache and 512 KB of "mid-level" instruction cache (what you and I would call L2 but for some reason Intel did not) on each core, plus 32 MB of shared L3 cache. That L3 cache looks like it is broken into two 16 MB segments, and in fact, Poulson looks like two four-core chips that have been interconnected (as you would expect). It is not clear how much L1 cache is on each Poulson core and how much is used for tags, registers, and directories. (We'll try to find out at ISSCC.)

One of the delays in getting the modified Tukwila Itanium 9300s into the field in 2008 and 2009 was that server makers wanted Tukwila, Poulson, and Kittson to share the same socket. And as promised, Poulson chips will plug into the LGA 1248 sockets used by Tukwila, and so will Kittson. So upgrades will be easy. Hopefully, Intel has built some bandwidth headroom into the Itanium platform.

McInerney said that Intel did, in fact, have some headroom in the "Boxboro" chipsets and memory boards that are shared by Itanium 9300 and Xeon 7500 systems when Tukwila chips came out last year. That is why Intel has been able to crank up the QPI speeds from the 4.8 GT/sec of the Tukwilas to the 6.4 GT/sec of the Poulsons. Assuming that the future Xeons and Itaniums will need more bandwidth, then the kicker to the Boxboro chipset will go even higher. Base 2 math would suggest that 9.6 GT/sec is the next stop on the QPI bus. For all we know, this is already cooked into the Boxboro chipsets, but just not activated.

Here's what the new Poulson core looks like:

Intel Poulson Itanium Core

The layout of the Poulson Itanium core

The big architectural change with the Poulson Itaniums is that the EPIC very large word instruction parallelism packaging mechanism has been made into a double-wide, moving from six-wide instruction processing to twelve-wide. In theory, and providing the application's mix of instructions works out right, this should come close to doubling the performance of Poulson cores compared to Tukwila cores, clock for clock and core for core. Which is why I don't think Intel is going to boost clock speeds on the Poulson Itaniums compared to the 1.33 GHz to 1.73 GHz of the Tukwilas. The TurboBoost speed could go up, and well beyond the 1.46 GHz to 1.86 GHz range of the Tukwilas.

With twice as many cores, processing twice as many instructions, and possibly with twice as many HyperThreads, the Poulson chips should yield anywhere from three, four, or five times the performance of the Tukwilas at the socket level. It depends on the threads and the efficiency of the twelve-wide EPIC instruction packaging. The eight other Itanium chips to date have all been six-wide chips, and it is unclear how software will take to twelve-wide pipes.

What I can tell you is that customers will not have to recompile their applications when they move to Poulson chips. "We are not anticipating that people will need to do a recompile," explains McInerney. He did add that just as is the case with any new processor, recompiling is often necessary to squeeze every drop of performance out of a system. But the performance comparisons that Intel will be making when Poulson gets closer to launch will be for code that was compiled on prior generations of Itaniums and plunking it on the Poulson systems unchanged.

The Poulson cores also have new data and instruction pipelines, a new floating point pipeline, and a new instruction buffer. The chip also has a number of dynamic power management features that gate power usage on elements of the Itanium chip and now the memory controllers and memory subsystems. Leakage current, power draw when idle, and power draw under load have all been reduced on the Poulson chip. Take a look:

Intel Poulson Itanium Power Draw

Tukwila and Poulson power management (lower is better)

In this chart, Intel shows the ratio of Tukwila to Poulson on several power scaling metrics. The blue bars show Tukwila and the red bars show what would happen if the Tukwila chip was unchanged and just implemented in a 32 nanometer process. The green bars show the effect of the design changes inside Poulson on these same metrics. While Poulson only reduces power leakage by 30 per cent better than a 32 nanometer Tukwila, the Poulson chips cut back on idle power usage by 70 per cent better and cut back on power used under load (that's the TDP Activity data) by 60 per cent more. In general, the power lost or consumed for the Poulsons for these metrics is about a fifth of what it is on the real 65 nanometer Tukwilas.

Finally, Poulson will include a slew of new error detection, correction, and prevention technologies not in the current Tukwila Itanium chips. Intel has added error detection for floating point instructions and expended soft error correction and boosted cache error coverage. The chip also allows for the logging of more information about errors in the chips to improve recovery, sometimes automagically.

Intel and its main Itanium partner, HP, are no doubt hoping that the Poulson specs will put to rest any talk about the impending death of Itanium.

"Intel's commitment, as evidenced by this development effort, is strong and it is unwavering," McInerney said on the call.

Don't expect for some in the IT market to believe it. They never will.

Intel is not talking about when Poulson chips will be delivered, but it seems likely that it will show up in early 2012, with Kittson in early 2014. ®