Pentium 4 dissected

Gory details of inner workings revealed

What do the 42 million transistors on a Pentium 4 actually do?

Willamette, aka Pentium 4 is the first new processor Intel has launched since the Pentium Pro. Sure, there's been Pentium II, Pentium III, Celeron and Xeon, but these all use the P6 microarchitecture introduced with the Ppro.

The problem P6 has is that, due to the pipelining it uses, it's subject to an absolute speed limit, which at a 0.18 micron process, equates to around 1.2GHz. Try to run it any faster than that and it just gets hotter rather than doing any more useful work.

The problems Chipzilla encountered with the 1.13GHz PIII are testament to the fact that the PIII is perilously close to its absolute speed limit.

P4 is entirely new and uses the tragically-trademarked NetBurst architecture with hyper-pipelined technology - twice the length of the P6 pipeline which significantly increases frequency scalability.

The downside is that the length of the pipe means fewer instructions per clock tick can be executed compared with a PIII (or Athlon). So at comparable clock speeds, a PIII or Athlon can be seen to outperform a P4.

This is an anomaly that will disappear as P4 moves ever onward and upward to clock speeds physically unattainable to the older architectures.

P4's rapid execution engine isn't something introduced by Dubya Bush to reduce the backlog of people on death row in Texas prisons, but a mechanism which runs the processor's arithmetic logic units at twice the core frequency of the rest of the chip.

Screaming Sindy gets more extensions

The Pentium 4 also has improved dynamic execution to more accurately predict branch utilisation. An execution trace cache stores D-coded instructions, which removes the decoder from the main instruction loop. The P4 also supports 144 new streaming SIMD Extension 2 instructions, with double precision floating point, 128-bit SIMD integer, and improved cache and memory management instructions.

The i850 (Tehama) chipset supports dual channel Rambus memory at an effective 400Mb FSB speed with a throughput of 3.2Gb/sec, while AGP 4X graphics run at over 1GB/sec - twice as fast as AGP 2X.

The move to 0.13 micron in the second half of next year also sees Intel moving to copper interconnects for the first time. Alongside this, a move to 300mm wafers will reduce production costs. Intel claims the change from aluminium to copper will produce a speed increase of around 65 per cent, while using less power and generating less heat. The smaller die size alone will reduce costs by around 30 per cent.

By 2003, Intel plans to have five fabs producing 0.13 micron 200mm wafers and three at 300mm. ®