The Register® — Biting the hand that feeds IT

Feeds

Poulson Itaniums hit 'Replay' for reliability

New instructions, better HyperThreading

SaaS data loss: The problem you didn’t know you had

Hot Chips The future eight-core "Poulson" Itanium is not just a process-shrink of the current four-core "Tukwila" Itanium 9300. Intel has been working to add new features to Poulson to make it useful running enterprise workloads – and to do so more reliably.

Intel already released a lot of Poulson details back at February's IEEE's International Solid-State Circuits Conference in San Francisco, and at the Hot Chips conference at Stanford University late last week, the company lifted the veil a little more – and continued to keep its head down in the legal spat between Oracle and HP over Itanium's long-term fate, .

Steve Undy, technical lead design engineer for Poulson at Intel, gave a presentation that walked through some of the chip's new features. And perhaps more important than any feature, Undy confirmed that Poulson was in its post-silicon validation and has been booted and tested on multiple operating systems and running in different system topologies.

HP's HP-UX, OpenVMS, and NonStop operating systems are expected to be available on the Poulson chips, as is SUSE Linux Enterprise Server and a number of proprietary operating systems from Fujitsu, NEC, and Bull. Poulson is on track for shipment in 2012.

A statement from Intel that was released late last week in conjunction with Undy's Hot Chips presentation said that the new Poulson instructions are intended "to help take future Itanium performance to the next level and to lay the foundation for the future of Itanium computing." The statement ended by saying that the follow-on "Kittson" Itanium processor is under development.

Intel Poulson Itanium Chip

Intel's Poulson Itanium processor, scheduled for 2012 (click to enlarge)

Like the Xeon processors, the Poulson Itaniums have a "core out" design that puts the cores on the outside edges of the chip with a shared L3 cache in the center, all linked together by a fast ring interconnect. Poulson's, the L3 cache weighs in at 32MB, and the chip has two integrated DDR3 main-memory controllers with a total of four Scalable Memory Interface (SMI) links out to memory boards.

Poulson has four full-width and two half-width QuickPath Interconnect (QPI) links, which run at 6.4GT/sec. The chips are baked in a 32-nanometer process, have an area of 544 square millimeters, have 3.1 billion transistors, and have a maximum thermal design point of 170 watts with all cores humming along.

Intel has not yet talked about clock speeds, but the speculation is that the clock speed won't change much from the current Itanium 9300s, which were launched in February 2010 and which run at between 1.33GHz to 1.73GHz. These Tukwila Itaniums are made in Intel's 65-nanometer processes, have just over 2 billion transistors, and peak out at 185 watts across their four cores.

Intel Poulson Itanium chip block diagram

Schematic of Intel's Poulson Itanium chip

The Poulsons will offer twice the cores of the Tukwilas, QPI and SMI links that run 50 per cent faster, plus 33 per cent more L3 cache on-chip. The Poulsons will not scale beyond eight sockets in symmetric multiprocessing configurations – the same level as the Tukwilas, which could also scale to eight sockets. Presumably the faster QPI and SMI links will help SMP performance, however.

The Poulsons will plug into the same sockets used by Itanium 9300 servers, and that might mean customers running HP's Integrity servers will do processor upgrades before they do system upgrades. This may or may not be good news for HP, but at this point, HP has admitted that Oracle's decision back in March to stop development of its database, middleware, and application software has adversely impacted Integrity server sales. In some cases, customers are putting off buying machines, and in others they've canceled orders.

There is more to the Poulson chips than just adding cores to the die and hooking them up with a ring interconnect. The Poulson cores themselves are different. Here's what they look like, schematically:

Intel Poulson Itanium core schematic

Block diagram of the Poulson Itanium core

The first interesting thing to note is that the Poulson core has fewer transistors than the Tukwila core (89 million versus 109 million) and occupies less than a third of the area, while at the same time maintaining application compatibility and doubling the instruction pipeline width to 12 instructions.

One of the new features in that updated Itanium pipeline is called Instruction Replay Technology, which is designed to improve system uptime. With the IRT feature, Intel has put an instruction buffer in the pipeline and if an instruction goes haywire as it moves down the Poulson pipeline, rather than crash the system or corrupt data, an errant instruction is re-executed from the instruction buffer.

This instruction buffer in the Poulson pipeline has another important role to play in an improved HyperThreading scheme that will debut with these future Itanium chips. The buffer breaks the pipeline into a front-end and a back-end, creating a dual-domain multithreading that allows for the front-end and back-end parts of the pipeline to be independently threaded.

Intel's chip engineers have also added pipeline-specific thread switch mechanisms to deal with this more complex and wider Poulson pipeline, as well as dual-threaded register files, dual-threaded data side translation buffers (TLBs), and a new fairness mechanism.

Intel is also adding a number of new instructions with the Poulson Itaniums to have better thread control, expanding prefetching of data and instructions for the pipeline, and adding hints for data access for L1 caches. The Poulson also has three new integer operations to boost the performance of legacy Itanium code without requiring for applications to be recompiled. ®

Steps to Take Before Choosing a Business Continuity Partner

New instructions for legacy code?

"The Poulson also has three new integer operations to boost the performance of legacy Itanium code without requiring for applications to be recompiled."

How does that make any sense? If it is legacy code and isn't recompiled, it doesn't use the new operations...

5
1

Can't see any deckchairs in the pics.

Have Intel moved them?

2
0

Instruction replay?

Anybody know what "instruction replay" does for you that hasn't been doable (given a smart enough OS) on any demand-paged virtual memory machine since, say, a VAX?

Something unwanted happens (e.g. the virtual to physical address translation hardware says "that address isn't accessible in memory right now) and an exception is caused.

The OS handles the exception. In the case of demand paged virtual memory, some behind the scenes magick happens to make it look as though the required page of memory is there.

The OS backs off the PC and the instruction is replayed from the top (or, if it's a multi-part instruction, replayed from where it was interrupted).

VAXes and VMS did that in 1978, as did everyone else doing demand paged virtual memory.

Itanium has a well known problem with exception handling, in that taking an exception disproportionately screws up the program (and system) performance, even if it's something relatively minor like an alignment fault. Maybe they've finally got around to doing something about that in hardware (same as it took the Alpha guys a few years to realise that lack of byte/word instructions *was* actually a problem in a Microsoft-dominated world)?

Would've been nice to have some details, either here or in the 10-slide Intel Powerpoint on which the article is based.

2
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
You don't need phone lines or cable for ANYTHING, says Dish
The satellite-dish man can sort you out with phone and broadband over the air too
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches