Apple accused of cheating over G5 benchmarks

Original URL: https://www.theregister.com/2003/06/24/apple_accused_of_cheating_over/

SPEC vs SPEC

Posted in Personal Tech, 24th June 2003 15:54 GMT

Benchmark results cited by Apple at the launch of its Power Mac G5 desktops yesterday have already come under fire for seeming to not only tweak the Mac test system to improve its performance beyond anything an ordinary user might experience, but crippling rival systems to deliver below-par average user performance.

The tests described by Apple CEO Steve Jobs were conducted on the company's behalf ("under contract") by VeriTest. The benchmarks used are SPEC CPU 2000 integer and floating-point tests. Apple asked VeriTest to compare a pre-release a dual 2GHz Power Mac G5 with a Dell Precision 650 workstation based on twin 3.06GHz Intel Xeon CPUs and a Dell Dimension 8300 based on a 3GHz Pentium 4.

The Dell's were running Red Hat Linux 9.0, the G5 Mac OS X 10.2.7. The test software was compiled using GCC 3.3 and NAGware Fortran 95.

VeriTest recorded SPECint base score of 800, 889 and 836 for the G5, 8300 and 650, respectively. The equivalent SPECfp base scores were 840, 693 and 646. So the G5 out-performs the other machines, yes?

Well, so says Apple, but a closer look at VeriTest's documentation, freely available from its web site, suggests otherwise.

Certainly SPEC figures published on the SPEC web site do, as Register readers noted, along with readers at a number of web sites today. The corresponding SPECint and SPECfp base Dell-provided results for the 650 are 1089 and 1053. Equivalent figures for the Dimension 8300 are not available.

That puts Apple's figures in a new light. On one hand we have figures that suggest the 2GHz G5 outperforms the 3GHz Xeon in certain benchtests, and on the other we have numbers that show the exact opposite. What gives?

Firstly, Dell's own figures were calculated using different compilers and host operating system: Windows XP Pro, Intel's own C++ and Fortran compilers, and the MicroQuill SmartHeap Library 6.01. Secondly, the compiler used by VeriTest, GCC, is said to generate code that less well optimised for x86. Thirdly, VeriTest seems to have adjusted the test hardware to favour the G5. Again, all the details are there in the documentation.

VeriTest admits it used an Apple-supplied tool to adjust the G5 processor's registers "to enable Memory Read Bypass" and "to enable the maximum of eight hardware prefetch streams and disable software-based pre-fetching". The company also installed a "high performance, single-threaded malloc library... geared for speed rather than memory efficiency". That, says VeriTest, "makes it unsuitable for many uses".

We'd guess these are hardly standard system configurations.

VeriTest also says it tweaked the Dell boxes. For example, when it came to the SPECint and SPECfp rate tests, it disabled HyperThreading, though enabled it for the base SPECint and SPECfp tests. While the compilers were set to optimise code for the Pentium 4, SSE 2 instructions were not used to speed floating point maths operations, only SSE 1 instructions were enabled. VeriTest provides no clear rationale for these choices.

Without that rationale, both VeriTest and Apple are today widely being accused of cheating, and not only by x86 fans eager to see the G5 knocked a peg or two back down the CPU performance ladder. To be fair, at least Apple and VeriTest tell you what they've done, which is more than can be said for the vendor-supplied figures on SPEC's web site. What tweaks have vendors applied to boost their own scores? But they should also say why.

The VeriTest test appears to be an attempt to get the apples/oranges comparison as close as they possibly can, but looking deeper suggests that they have failed to do so.

As we noted in our story on Apple's G5 introduction, we await independent, real world tests of the new Power Mac G5's performance. Only then will Mac users - and everyone else, for that matter - get a truly worthwhile comparison between platforms. ®