P4 – Now clock this
More chuckling boffins on the case
Well, when you start going on about clock speed, you know you're in for it. Geoffrey Barnett hasn't stopped laughing since last week:
I just read James Perry's defense of why the Itanium has lower clock speeds than the P4 and nearly laughed my arse off! Why, you might ask? 'Tis quite simple, really -- this is the exact, same argument that us fellows in the Macintosh/PowerPC camp have been using for the past few years!!
Of course, when WE put it forth, the Wintel gearheads all scoff merrily and flame us until we resemble overdone charcoal, so imagine my surprise when I see
one of 'em stealing OUR argument to defend an Intel product against - ANOTHER Intel product!
Oh, the brazenness! Oh, the hypocrisy! Oh, my ribs!
Dr. David Crocker then chipped in with some advanced boffinspeak:
The IA64 architecture was indeed intended to reach high clock speeds. Modern x86 processors have a number of complicated features, i.e.:
- Complicated instruction decode logic (because of variable instruction length)
- Out-of-order execution (needed to raise the average number of instructions per clock much above 1)
- Register renaming (needed because of the small number of x86 registers, and to make it more likely that multiple instructions can be executed simultaneously)
IA64 was intended to enable more instructions to be executed in parallel (basically by having lots more registers, but with a few neat features like predicated instructions as well) and also to greatly simplify the instruction decode and issue logic, enabling an increase in clock frequency and/or a decrease in pipeline length.
However, since the IA64 architecture was designed, Intel and AMD engineers have managed to push far more performance out of the x86 architecture than was previously thought possible. Meanwhile the Itanic engineers seem to have made a pigs ear of the design. An IA64 processor *should* be much simpler to build than an x86 processor and (assuming the same process technology) should either clock at a higher rate or have fewer pipeline stages. Maybe Intel/HP will achieve this in McKinley.
As to the Itanic executing at least 3 instructions per clock instead of an average of just over 1 for x86: this assumes that the compiler can always find 3 instructions that are independent of each other. In practice, quite often the compiler will be unable to do this and will insert NOPs, or the compiler will schedule speculative (predicated) instructions which may turn out to be wasted.
IA64 will have an advantage over IA32 in floating-point intensive software (because the x86 has an inefficient stack-based floating point architecture) - although that advantage may be lost if software is recompiled for SSE2 on IA32 - but in order to obtain a significant improvement in integer performance over an IA32 processor, my expectation is that Itanic will need to be clocked almost as fast.