Supers get greener
But HPC iron still glows red hot
The Top 500 ranking of the world's supercomputers, put out by a group of performance-loving nerds, came out  a few weeks ago. Now a few efficiency-loving nerds have added power-consumption figures to the Top 500, resorted the list, and have created the Green 500  supercomputer ranking.
The pursuit of hundreds of teraflops and now tens of petaflops comes at a price, and that price includes power consumption and heat dissipation. Some supercomputer architectures are more power-efficient than others, as the Green 500 rankings clearly show.
The Top 500 list is compiled by Erich Strohmaier and Horst Simon of the Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim. The list has been published twice a year for nearly 17 years, and Dongarra has Linpack Fortran benchmark test results that were run on Jacquard looms back in the 19th century.
OK, I'm exaggerating. Slightly.
The Green 500 list is a much more recent development. Created by Wu-chun Feng and Kirk Cameron of Virginia Tech, this is the fifth ranking, but only the third one that has been made public.
The most efficient supercomputer on the Green 500 list is a fairly modest 2,016-core cluster based on IBM's PowerXCell 8i processors (in its QS22 blade servers) that employs InfiniBand to lash the server nodes together.
This machine, installed at the University of Warsaw's Interdisciplinary Centre for Mathematical and Computational Modeling, is rated at a modest 18.6 teraflops, but because it only burns 34.6 kilowatts of electricity, it comes in at 536.2 megaflops per watt. And that's thanks in large measure from its using the new 4GHz Cell chip all by itself instead of the 3.2GHz variant paired with dual-core Opteron processors, as do a number of other efficient supers.
Because Warsaw's Cell-based machine - which ranked number one in the November 2008 Green 500 list as well - is so small in terms of teraflops, it probably won't be on the November 2009 list of Top 500 supers, a fate that will drop it from the related Green 500 rankings.
It won't be the first to be so bumped. Three Cell-based machines installed at Spanish oil company Repsol YPF that once ranked at the top of the Green 500 list disappeared from the June 2009 ranking because they were only rated at 14 teraflops, but they delivered 530.33 megaflops per watt since they only burned 26.4 kilowatts of juice.
It looks like it's time for the Top 500 and Green 500 people to start building a larger list so that efficient machines don't get shaken out of the mix.
As was the case in the prior two Green 500 rankings, IBM's hybrid supers comprised of Opteron LS21 blades and QS22 Cell blades dominate the energy-efficient rankings, holding spots two through four.
Notably, IBM's "Roadrunner" hybrid Opteron-Cell super installed at Los Alamos National Laboratory, the first machine to break the one-petaflop barrier, is ranked number four on the Green 500 list, showing that Big Blue can build small machines that deliver good energy efficiency, then scale them up and still offer efficiency.
The Roadrunner box, which is rated at 1.1 sustained petaflops on the Linpack test, burns 2.5 megawatts of electricity for an efficiency of 444.9 megaflops per watt. The smaller Opteron-Cell clusters ranked above Roadrunner on the Green 500 list installed at IBM's own benchmarking center and at Los Alamos are rated at 458.3 megaflops per watt.
A whole bunch of BlueGene/P massively parallel clusters of various sizes and installed all over the world occupy slots six through 19 on the Green 500 list, with efficiencies that range from 364 to 371.7 megaflops per watt. The new iDataPlex machines from Big Blue (which have some attributes of blade and rack servers) hold some spots in the list at around 270 megaflops per watt.
The most interesting new machine on the Green 500 list is the Grape-DR cluster, which is a custom supercomputer based on a 256-core chip that was developed by the University of Tokyo, the National Astronomical Observatory of Japan, the Institute of Physical and Chemical Research, and telecom giant NTT.
The Grape-DR machine, which is installed at the observatory, is comprised of 8,192 Grape-DR chips running at a modest 330MHz and running CentOS Linux. Each, however, delivers 10.3 gigaflops of oomph, allowing the Grape-DR cluster to hit just under 22 teraflops with nearly 2.1 million cores.
The Grape-DR cluster, which is but the latest in a line of custom supers based on custom chips designed in Japan since 1992, only burns 51.2 kilowatts, allowing it to boast a rating of 428.9 megaflops per watt on the Green 500 ranking. Its number-five position slots it right between IBM's hybrid Opteron-Cell boxes and the wall of BlueGene/P machines.
The most efficient x64-only boxes on the list are based on Intel's new Xeon 5500 Nehalem EP processors, and include not only the IBM iDataPlex boxes but also a machine ranked at number 20 built by NEC for the University of Stuttgart, as well as two boxes built by Atipa for two different atomic labs in the States that almost certainly will be knocked from the list next time around if they're not upgraded.
Cray XT4 and XT5 and SGI Altix ICE parallel supers, as well as a mix of BlueGene and iDataPlex machines from IBM, dominate the top 100 on the Green 500 list, along with a smattering of clusters built from x64 servers from Dell, Sun Microsystems, and Fujitsu. There is not one HP machine in the top fifth of the Green 500 rankings.
Averaged across the entire Green 500 list, the researchers who put together this power ranking say that the efficiency of the machine increased by 10 per cent, from 98 to 108 megaflops per watt, since the November 2008 ranking, while the aggregate power of all the boxes on the list also increased by 15 per cent, from 200 to 230 megawatts.
The power bill is not going down so much as the performance levels are going up.
In the November 2008 Top 500 and Green 500 lists, the aggregate performance of the machines ranked was just under 17 petaflops, but in the June 2009 lists it rose to 22.6 petaflops. In general, Wu-chun and Cameron say that there are more machines above the 200 megaflops per watt threshold and fewer machines below the 50 megaflops per watt level. Moreover, quad-core and six-core processors are helping to boost the energy efficiency of clusters.
The absolute worst supercomputer on the Green 500 ranking is installed at an unnamed IT service provider that has six different blade server clusters based on various generations of x64 blade servers from HP. The cluster in question, which is ranked at 311 on the Top 500 supers list, has 8,192 Opteron cores using 2.4GHz dual-core Opterons and is rated at just over 21 teraflops. Unfortunately, the Green 500 power experts reckon that this box burns 1.6 megawatts of juice, giving it a rating of a hair over 13 megaflops per watt.
That's twice as bad in terms of energy efficiency as the Sun Tsubame Opteron blade cluster located at TiTech in Japan, which uses ClearSpeed co-processors to boost the number-crunching power, but which nonetheless ranks at an embarrassing 494 on the Green 500 list because it consumes 3.3 megawatts to deliver its 87 teraflops, for a rating of 26.4 megaflops per watt.
The Cray XT4 installed at the University of Edinburgh also ranked poorly in terms of energy efficiency, with the 54.65 teraflops Opteron cluster burning 2.6 megawatts of juice and delivering only 21 megaflops per watt. ®