Apple denies fiddling G5, Xeon tests
Value of benchmarks as a whole still in question
Apple has defended its benchmark comparison of its new Power Mac G5 and two Dell Pentium 4 and Xeon-based systems, stating that the tests performed and the way those tests were conducted is all above board.
Far from adjusting the Intel-based machines to yield lower scores, Apple's contract tester, VeriTest, actually chose settings to improve the Dell scores, Apple's VP of hardware product marketing, Greg Joswiak, told Slashdot yesterday.
Apple came under criticism from a number of web sites yesterday after they compared its SPEC CPU 2000 benchmark results with others found on the SPEC web site. SPEC-published scores for one of the comparison machines, Dell's dual-3.06GHz Xeon-based Precision Workstation 650, were rather higher than those published by Apple.
A closer look at VeriTest's findings revealed the company had apparently tweaked the G5 to improve performance and switched off apparently performance-boosting features like HyperThreading.
According to Joswiak, HT was disabled in the SPECint and SPECfp base tests because it yielded higher scores than when HT was enabled. VeriTest did keep HT switched on when it performed its SPECint and SPECfp rate tests.
Indeed, a number of Register readers have pointed out a report on Dell's web site that supports Joswiak's claim. Essentially, it says HT is good for server applications, but less well suited to compute-intensive apps. It uses SPEC CPU 2000 as an example of such an application, and found a "system performance decreased 6-9 per cent on the CPU 2000 speed tests and decreased 27-37 per cent on the CPU 2000 throughput tests" with HT enabled.
"The tests show that CPU 2000 and other serial (non-multithreaded) applications do not benefit from Hyper-Threading and may, in fact, incur a performance penalty because of resource contention issues," the report concludes.
Joswiak also said that the test conducted by VeriTest did make use of the Pentium 4's SSE2 SIMD engine for floating-point operations. Claims that it didn't were based on a mis-reading of the compile flags listed at the end of the VeriTest report. We have to admit we made that mistake, assuming that the flags would distinguish between SSE and SSE 2.
Joswiak admitted that the Dell machines would have scored higher if VeriTest had used Intel's own compilers rather than GCC 3.3, but equally the G5 would have rated higher if Apple had offered alternative PowerPC compilers. As we noted in our report yesterday, VeriTest used GCC on both platforms to make the comparison as close as possible. Joswiak noted that GCC is probably more optimised for x86 than PowerPC, having been available on the Intel platform for longer. He also claimed the scores were higher under Linux - the OS used in the published benchmarks - than under Windows.
Joswiak also promised that the tweaks made to the PowerPC 970 processor in the G5 will make it into shipping systems, so the tests better reflect the experience the average user will get. Memory Read Bypass will be turned on by default in shipping systems, he said, and software pre-fetching will be turned off.
As for the high-performance but less memory-efficient single-threaded malloc (memory allocation) library used in the test, Joswiak couldn't say whether it would be the default in shipping systems.
Nevertheless, his comments go some way to clarifying the issues raised by Apple's performance claims. But, as we noted in our previous report, we still don't see why VeriTest explained all this in its results write-up. Any good experiment report should detail not only what was done and the results achieved, but why particular methodologies were chosen. This VeriTest failed to do.
And while Dell may have detailed why HyperThreading should be disabled in SPEC tests, that's really no substitute for Apple or VeriTest not doing the same. Doubly so, given the ammunition it gives them to knock this Intel technology.
What the 'cheating' controversy has done at least is question the value of SPEC-based tests. VeriTest may well have done its utmost to negate the effect of compilers on the final results to provide a better machine-to-machine comparison, but what benefit is that for real users? There are clearly too many variables - CPU, OS, architecture, compiler, library, etc. - to give a meaningful result.
In any case, while Mac/PowerPC fans will laud the results, Intel buffs will claim that their machines are still faster in the real world thanks to their higher clock speeds. AMD supporters, meanwhile, will claim theirs is the fastest chip because Intel only wins when testers use better optimised compilers.
The obvious solution is an agreed set of tests, compiler and operating systems, but as the controversy over BapCo's benchmarks in the CPU world and FutureMark's graphics tests shows, vendors aren't going to support tests that don't give them the results they know they should or want to get.
That's why, as we said yesterday, and many Reg readers have since agreed, we want to see real-world tests based on applications people actually use. Apple's self-selected set of apps, including Blast, HMMer and Photoshop, show real benefits for users running those packages - not surprising given the widely accepted superiority of the 970's AltiVec SIMD engine over the P4 equivalent, SSE 2. But we'd like to see other tests, such as a broad range of games and productivity apps.
As for compiler optimisations, should these be allowed? If software developers are using Intel's compilers to create commercial apps, ie. the programs that users work with, than surely that's what should be used for benchmarks. Yes, it makes it much harder to separate the effect of the CPU, the architecture, the OS and so on from overall system performance, but that's what most users are really interested in: system performance. How well will New Box X run the applications that I need to run?
In any case, superior performance is generally a time-limited feature. If Apple has it now, it will probably lose it again when Intel's Prescott or AMD's Athlon 64 ship. What Apple can say - and its customers can take heart from - is that, thanks to IBM, it has caught up. What really matters is maintaining that broad parity. ®