Mac performance: up to snuff or up the duff?
Putting the Squawk into SPEC
Letters Subject: SPEC2000 g4 benchmarks article
I had a couple objections to your article because I think it made some errors by omission in three areas. The first is the comments about OS latency, the second is relative floating point performance and SSE2, and the third is P3 vs P4 in real world scientific apps. As background, I do want to try to underline that I maintain a pretty platform independent view, mission critical systems at my company run on Linux, BSD, Solaris, OS X, and even Win2K where appropriate.
First, with regards to operating system latency. I think it is important to note the difference between the occasional irritating lag when waiting for the damn beach ball to stop spinning, and actual latencies in the OS. OS X boasts some very nice features in this regard, particularly in audio processing latency under load. OS X has a higher latency at idle than most other OS's, but its latency under load stayed the same, and was better than the other OS's. Nearly fixed latency under load is pretty damn nice for a non-real-time OS.
On another note, I think you could have better covered the subject of scientific computing. I could go on endlessly about how programmers at my shop have demonstrated again and again that if you vector, the G4 conquers all, and if you don't, it only maintains clock parity, but I'd rather give you something more objective to use as a riposte.
Visit Apple's Advanced Computing Group web page at http://developer.apple.com/hardware/ve/acgresearch.html and scroll to the bottom. You will find a link to (an admittedly dated as it lists a 500 Mhz G4 as the fastest available) white paper from NASA evaluating the G4 for scientific computation against Cray, Alpha, MIPS, and P3 based systems in terms of flops per dollar and concluded that the G4 offers "bang for the buck" advantages in factors of between 5 and 8 over Alpha and P3 systems.
Their conclusions match those of my programmers perfectly. Without vector optimization, the G4 does not fare well against its competitors, but as soon as you vector your code, it absolutely dominates.
As for SSE2 vs. Altivec, SSE2 is a toy by comparison. Its architecture does not offer the range of generalized high precision capability that the altivec instruction set does. It is filled with bandwidth limitations, particularly its tiny number of harder to use registers that make it nearly impossible to keep the pipeline full, and it is capable of basically no parallelism whatsoever with the regular FP unit on the processor (which means it must start and stop each unit to switch back and forth, and the lack of generalization makes this an excruciating performance penalty). The small number of registers in particular makes the P3 a better scientific computing processor than the P4 for real world applications because the P4's pipe is too deep to keep it filled. This can be graphically demonstrated with fully optimized applications that force significant branching on real world data.
The newer PPC7450 series machines have even better vector performance than their predecessors, for a ruthless extremely optimized cross-platform demonstration, try running distributed.net's RC5 cracker on a G4 against anything else. The dual 1ghz G4 cracks more than 25 million keys a second. That is more than an order of magnitude faster than a 1.5ghz P4 from Dell does on the exact same job. As for P3, it does better than the P4, but still runs three times slower per clock than the 7400 G4. (The newer 7450 is substantially 3x faster than clock). A 1 Ghz P3 outperforms the 1.5 Ghz P4 by a sound margin. This application is a good demonstration of the problems with the P4's ridiculously deep pipes and crappy registers.
For strictly integer performance, the G4 only achieves clock parity. The higher clock x86 machines are clearly faster at integer than G4.
Justin G. Cordesman
President, Dark Side Research, Inc.
I have great faith in Apple.
Luck at OSX 10.0 compared to 10.1. We'll see you at 10.3 and G5s.
Remember, Apple was in the death bed just 4 years ago.
I do believe that Moto has to get serious about the PPC, but then again as they said, they are giving it a couple of years to see it is feasible to stay in that field. If it doesn't, then it would be up to Apple to either buy them out or start using AMD.
Either way, again I think Apple will mature and kick some serious but.
Yup, it's a shame that Apple didn't go with the 64-bit Alpha chip...
If they had Compaq might not have given the line over to Intel (even though
Samsung is still making them it looks like a dead-end.)
Can you imagine a Mac with a 256-bit wide data bus at 2GHz (on top of the
64-bit 132MHz PCI?)
I concur. I have a 366 mhz PII Thinkpad 600e with Win 2k and it runs similar apps (ie Java) faster than my new 550 mhz TiBook. Depressing to say the least.
(Except for the screen and battery length, the Thinkpad is the better machine, too: better keyboard, nipple, ergonomic wrist area, rubberized case--so much for "Insanely Great".)
Too bad I hate MS so much and Linux is such a joke as a workstation OS.
From: Stephen Brown
To: [email protected]
You have just been deleted.
Any website that would post such crap is crap.
I would guess that you Linux people are feeling a little threatened these days since Apple is now the largest UNIX distributor in the world. There is no more room in my bookmarks for the Reg.
Stephen Brown ®
Good by, Stephen. Sorr t se yo g. ®
Sponsored: Hyper-scale data management