Microserver chip performance smackdown
Tilera v Atom v ARM, with worried Xeon watching
There's been a lot of talk about how well future servers based on Atom, ARM, Tilera, and other processors might compete against workhorse Xeon or Opteron servers.
What there hasn't been a lot of is benchmark data that shows how those so-called microserver processors stack up – until now.
Tilera threw a coming-out party for its Tile-Gx 3000 line of many-cored processors last week at the Structure 2011 conference in San Francisco.
After some pestering from El Reg, the company provided some initial performance specs on their processors, showing how they stacked up to ARM and Atom chips.
The Tilera circuits are technically known as system-on-chip (SoC) designs, and they pack cache memories, main-memory controllers, and I/O subsystems onto a single die along with 36, 64, or 100 64-bit RISC cores.
Tilera has seen some uptake among network equipment providers for its earlier TilePro chips, and is hoping to break into the server racket big time with the Tile-Gx3000s, which will sample this summer in a 36-core variant and move up to 64-core and 100-core versions early next year – if all goes well.
Even though the Tile-Gx3000 chips are not yet shipping, Tilera has put the chip through its paces on the CoreMark benchmark test, a relatively new assessment that was created to replace the popular Dhrystone that's been used for oh-so-many years.
CoreMark, as its name suggests, is not intended to measure the relative performance of whole systems, but rather to stress-test the pipeline structure of a processor's cores and how well a chip can handle integer, read/write, and control operations.
The benchmark is written in C, and includes matrix math, sorting algorithms, and other functions. There's also a cyclic redundancy check based on calculated variables, which ensures that compiler makers can't get in cahoots with chip makers and pre-compute results to artificially goose test scores.
Here's the chart that Tilera's director of cloud computing applications Ihab Bishara provided to give a sense of the relative performance on the CoreMark test of two Tilera chips versus an Nvidia Tegra2 ARM processor and an Intel Atom N270 – but before you peruse it, note that the Tile-Gx processor and the Atom N270 tests were conducted by Tilera and have not been submitted to the official CoreMark site.
Tiny cores running the CoreMark benchmark test
Tilera has normalized the performance results per core in the above comparison. As you can see, the performance for Tilera, ARM, and Atom cores are very similar on a per-core basis. But there is more to the story than this chart shows.
As you can see from CoreMark's benchmark test rankings, the first Tilera chip in the chart is the TilePro64, which had 64 cores running at 866MHz, with 62 of them running the benchmark test. With the GCC compiler stack and a homegrown Linux, this TilePro64 chip achieved a CoreMark of 145,154 – by far the most powerful processor in the official CoreMark rankings.
Tilera didn't say which of the new Tile-Gx3000 series processors it had tested to make the chart, but presumably it shows where a 64-core Tile-Gx3064 would end up with 62 cores dedicated to the same task: just under 3,500 CoreMarks per core.
The Tegra2 chip, which is based on the 32-bit Cortex-A9 core from ARM Holdings and which spins at 1GHz, was able to do 5,866 CoreMarks on the test, which works out to 2,933 CoreMarks per core.
For reasons that will become apparent in a minute, Tilera decided to run a test on an Atom N270 processor running at 1.6GHz. The N270 is a single-core, two-threaded processor and Tilera turned off HyperThreading – it achieved just under 3,000 CoreMarks.
Performance per watt anxiety
The real competition that Tilera faces as it tries to move into servers is not ARM processors, but faster Atom chips and low-powered Xeons.
The dual-core, four-thread Atom N330 running at 1.6GHz, which is now entering its end-of-life phase, rates 9,050 CoreMarks, or 4,525 per core. At 8 watts per chip, that works out to 1,131 CoreMarks per watt. That sounds pretty good until you do the math on a 64-core Tile-Gx3064, which is rated at 35 watts and which should yield around 6,200 CoreMarks per watt.
Perhaps more interestingly, CoreMark was also run on a Fujitsu RX300 S6 blade server with two of Intel's low-voltage 2.26GHz Xeon L5640 processors, which have six cores and a dozen threads.
With all 24 threads turned on, the RX300 blade server delivered 118,572 CoreMarks, or 4,941 CoreMarks per thread. The Xeon L5640s are rated at 60 watts each, and so – not including the Xeon 5500 chipset – they delivered a dinky 988 CoreMarks per watt.
That's a factor of six worse than the expected bang-for-the-watt that the Tile-Gx3000 series will deliver. Some of the Xeon processors tested on CoreMark have done better in terms of delivering more CoreMarks per core or thread, but they also did a lot worse on power efficiency as gauged by CoreMarks per watt.
Obviously, CoreMark is a CPU core benchmark, not a system-level benchmark, and it will be interesting to see how all of these processors line up as they chase hyperscale workloads.
But it's already clear that future Xeon and Atom processors have to do better to keep the Tilera and ARM camels from getting their noses under the data center tents. ®