Intel sends 'Poulson' Itaniums to the shrink

'Designed with the future in mind'

Intelligent flash storage arrays

Would you like to share my socket?

The Poulson chip has a combined 54 MB of on-die memory, including L1 and L2 caches, tags and registers, and directory caches. 50 MB of this is in static RAM caches. There is 256 KB of "mid-level" data cache and 512 KB of "mid-level" instruction cache (what you and I would call L2 but for some reason Intel did not) on each core, plus 32 MB of shared L3 cache. That L3 cache looks like it is broken into two 16 MB segments, and in fact, Poulson looks like two four-core chips that have been interconnected (as you would expect). It is not clear how much L1 cache is on each Poulson core and how much is used for tags, registers, and directories. (We'll try to find out at ISSCC.)

One of the delays in getting the modified Tukwila Itanium 9300s into the field in 2008 and 2009 was that server makers wanted Tukwila, Poulson, and Kittson to share the same socket. And as promised, Poulson chips will plug into the LGA 1248 sockets used by Tukwila, and so will Kittson. So upgrades will be easy. Hopefully, Intel has built some bandwidth headroom into the Itanium platform.

McInerney said that Intel did, in fact, have some headroom in the "Boxboro" chipsets and memory boards that are shared by Itanium 9300 and Xeon 7500 systems when Tukwila chips came out last year. That is why Intel has been able to crank up the QPI speeds from the 4.8 GT/sec of the Tukwilas to the 6.4 GT/sec of the Poulsons. Assuming that the future Xeons and Itaniums will need more bandwidth, then the kicker to the Boxboro chipset will go even higher. Base 2 math would suggest that 9.6 GT/sec is the next stop on the QPI bus. For all we know, this is already cooked into the Boxboro chipsets, but just not activated.

Here's what the new Poulson core looks like:

Intel Poulson Itanium Core

The layout of the Poulson Itanium core

The big architectural change with the Poulson Itaniums is that the EPIC very large word instruction parallelism packaging mechanism has been made into a double-wide, moving from six-wide instruction processing to twelve-wide. In theory, and providing the application's mix of instructions works out right, this should come close to doubling the performance of Poulson cores compared to Tukwila cores, clock for clock and core for core. Which is why I don't think Intel is going to boost clock speeds on the Poulson Itaniums compared to the 1.33 GHz to 1.73 GHz of the Tukwilas. The TurboBoost speed could go up, and well beyond the 1.46 GHz to 1.86 GHz range of the Tukwilas.

With twice as many cores, processing twice as many instructions, and possibly with twice as many HyperThreads, the Poulson chips should yield anywhere from three, four, or five times the performance of the Tukwilas at the socket level. It depends on the threads and the efficiency of the twelve-wide EPIC instruction packaging. The eight other Itanium chips to date have all been six-wide chips, and it is unclear how software will take to twelve-wide pipes.

What I can tell you is that customers will not have to recompile their applications when they move to Poulson chips. "We are not anticipating that people will need to do a recompile," explains McInerney. He did add that just as is the case with any new processor, recompiling is often necessary to squeeze every drop of performance out of a system. But the performance comparisons that Intel will be making when Poulson gets closer to launch will be for code that was compiled on prior generations of Itaniums and plunking it on the Poulson systems unchanged.

The Poulson cores also have new data and instruction pipelines, a new floating point pipeline, and a new instruction buffer. The chip also has a number of dynamic power management features that gate power usage on elements of the Itanium chip and now the memory controllers and memory subsystems. Leakage current, power draw when idle, and power draw under load have all been reduced on the Poulson chip. Take a look:

Intel Poulson Itanium Power Draw

Tukwila and Poulson power management (lower is better)

In this chart, Intel shows the ratio of Tukwila to Poulson on several power scaling metrics. The blue bars show Tukwila and the red bars show what would happen if the Tukwila chip was unchanged and just implemented in a 32 nanometer process. The green bars show the effect of the design changes inside Poulson on these same metrics. While Poulson only reduces power leakage by 30 per cent better than a 32 nanometer Tukwila, the Poulson chips cut back on idle power usage by 70 per cent better and cut back on power used under load (that's the TDP Activity data) by 60 per cent more. In general, the power lost or consumed for the Poulsons for these metrics is about a fifth of what it is on the real 65 nanometer Tukwilas.

Finally, Poulson will include a slew of new error detection, correction, and prevention technologies not in the current Tukwila Itanium chips. Intel has added error detection for floating point instructions and expended soft error correction and boosted cache error coverage. The chip also allows for the logging of more information about errors in the chips to improve recovery, sometimes automagically.

Intel and its main Itanium partner, HP, are no doubt hoping that the Poulson specs will put to rest any talk about the impending death of Itanium.

"Intel's commitment, as evidenced by this development effort, is strong and it is unwavering," McInerney said on the call.

Don't expect for some in the IT market to believe it. They never will.

Intel is not talking about when Poulson chips will be delivered, but it seems likely that it will show up in early 2012, with Kittson in early 2014. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Turnbull should spare us all airline-magazine-grade cloud hype
Box-hugger is not a dirty word, Minister. Box-huggers make the cloud WORK
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Do you spend ages wasting time because of a bulging rack?
No more cloud-latency tea breaks for you, users! Get a load of THIS
prev story


10 ways wire data helps conquer IT complexity
IT teams can automatically detect problems across the IT environment, spot data theft, select unique pieces of transaction payloads to send to a data source, and more.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Choosing a cloud hosting partner with confidence
Download Choosing a Cloud Hosting Provider with Confidence to learn more about cloud computing - the new opportunities and new security challenges.