Top 500 supers - world yawns at petaflops
Not the norm. But getting there
The annual International Supercomputing Conference kicked off this morning in Hamburg, Germany, with the announcement of the 33rd edition of the Top 500 supercomputer rankings. While petaflops-scale machines are far from normal, they soon will be.
Not surprisingly, HPC vendors and academics are gearing up to try to push performance up by three orders of magnitude to break through the exaflops barrier - something that will take radically different server and network fabric designs and plenty of time to accomplish. But in the meantime, everyone is trying to show they can break the petaflops barrier, and soon, they will be breaking the 10 petaflops barrier.
With the June 2009 ranking, the home team in Germany - which has two monster machines in the top ten this time around - will be celebrating. Well, as much as supercomputer nerds celebrate. (We know you are really using the new Jugene and Juropa supers to play video games, at least when the administrators aren't looking. Let's hope the game is not global thermonuclear war).
The Forschungszentrum Juelich (FZJ) has been on a buying binge this year, upgrading its two supercomputers so it can lay the claim of being the floppiest supercomputer center in Europe. The Jugene BlueGene/P system that FZJ bought from IBM packs together 294,912 PowerPC 450 cores running at 3.4 GHz, using a proprietary BlueGene interconnect to deliver 825.5 teraflops of oomph for various research projects, giving it the number three position on the Top 500 list. It runs SUSE Linux - as if you expecte anything else.
Down the hall at FZJ is a hybrid machine made by Bull and Sun Microsystems, called Juropa, which is comprised of a mix of Bull NovaScale R422-E2 rack servers and Sun's X6275 blade servers, all linked together using the new quad data rate InfiniBand switches from Mellanox. It's ranked at number ten on the list. (Those Mellanox switches were the final nail  in interconnect maker Quadrics' coffin, since the Juropa prototype used its products and the final machine did not).
The Juropa nodes all use Intel's quad-core Xeon 5500 processors (formerly known as "Nehalem EP" or "Gainestown" if you track code names) and run SUSE Linux. The combined bits of the Juropa machine have 26,304 cores in total and were rated at 274,800 on the Linpack Fortran test, which means 89.1 percent of the peak theoretical performance of the processors was delivered when the Fortran test was run. The Jugene machine has an efficiency of about 82.3 percent on the Linpack test.
The Top 500 supercomputer list comes out twice a year, giving food for thought to the two major HPC events of the year, Supercomputing in North America and ISC in Europe. The list is maintained by Erich Strohmaier and Horst Simon, computer scientists at Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim. The ranking is based on the installed machine running the Linpack Fortran benchmark test created by Dongarra and colleagues Jim Bunch, Cleve Moler, and Pete Stewart back in the 1970s to gauge the relative performance of computers of all stripes and sizes on numerical calculations.
The two machines at the top of the June 2009 ranking are exactly the same as they were on the November 2008 list. Number one is IBM's hybrid Opteron-Cell "Roadrunner" machine, which the U.S. Department of Energy has installed at Los Alamos National Laboratory. The machine is currently using dual-core 1.8 GHz Opteron chips and 3.2 GHz PowerXCell 8i co-processors, delivering 1.1 petaflops of number-crunching power (the same performance it had last November). Roadrunner has 129,600 processor cores in total and runs at about 75.9 per cent of peak theoretical throughput. (Moving up to faster 40 Gb/sec InfiniBand switches would probably boost performance on Roadrunner without adding cores to the box).
Number two on the Top 500 is the "Jaguar" Cray XT5 cluster installed at the DOE's Oak Ridge National Laboratory, which is made from 37,538 of Advanced Micro Devices' quad-core "Shanghai" processors running at 2.3 GHz and delivering 1.06 petaflops of oomph. It too had the same ranking late last year. (That's because the heavy workload that Jaguar has been under has not allowed it to be retested, according to Strohmaier).
The "Pleiades" Altix ICE 8200 cluster made by Silicon Graphics (the old one, not the new one that is really Rackable Systems with the old SGI product line added in) for NASA's Ames Research Center is ranked at number four on the list, with 487 teraflops, the same as six months ago but Jugene bumped it down. The number five box on the ranking - IBM's BlueGene/L massively parallel box installed at Lawrence Livermore National Laboratory and the number one machine on the November 2007 list when it debuted - was still rated at 478.2 teraflops.
There are two more BlueGene/P systems in the top ten, which are kickers to this BlueGene/L and siblings to the larger Jugene machine at FZJ.
Number six on the Top 500 list this time around is a sibling machine nicknamed "Kraken" that is also an XT5 machine from Cray that is installed at the University of Tennessee. It has 66,000 cores, is rated at 463.3 teraflops, and is the most powerful supercomputer installed at a university anywhere in the world.
Number seven on the list is a BlueGene/P box installed at Argonne National Laboratory, which was upgraded a smidgen to 458.6 teraflops but which still fell two spots in the ranking. Number eight on the list is the the parallel machine built by Sun Microsystems using its X6420 blade servers with quad-core Shanghai Opterons running at 2.3 GHz and linked by Sun's "Magnum" InfiniBand DDR switches. The Ranger cluster has a total of 62,976 cores and it's rated at 433.2 teraflops.
Rounding out the top ten is "Ranger," at number nine on the list, is a machine named "Dawn," a companion BlueGene/P box that sits next to that BlueGene/L box at Lawrence Livermore National Laboratory. It's rated at 415.7 teraflops.
Other new and notable machines on the list include a 185.2 teraflops BlueGene/P super sold by IBM to the King Abdullah University of Science and Technology in Saudi Arabia, ranked number 14 on the list, and a 180.6 teraflops cluster called "Magic Cube" at the Shanghai Supercomputer Center, the largest machine on the list equipped with Microsoft's Windows HPC Server 2008 operating system. This system was built by Chinese server maker Dawning and was on the list as of last November.
And the winner is...IBM. Or HP
So those are the big machines. Now let's talk about trends, the interesting part of the Top 500 ranking. There are 488 clusters on the list this time around, which comprise 58.6 per cent of the aggregate 22.61 petaflops on the list. There are 88 massively parallel machines, which get another 40.9 percent of the oomph. And there are two so-called constellation architecture boxes (not to be confused with Sun's product line) that make up a fraction of a per cent of the capacity.
By vendor, IBM is the winner when ranked by teraflops and Hewlett-Packard is the winner when ranked by systems. IBM has 188 systems on the June 2009 supers list, for a total of 37.6 per cent of machines, but at 8.9 petaflops across all of those machines, it had a 39.4 per cent share of capacity. HP has 212 machines on the list this time around (42.4 per cent of machines), but only 5.68 petaflops of capacity all told (25.1 per cent of the total).
It has been a long, long time since HP was anywhere near the top of this ranking. In fact, it was called Compaq when it was near the top of the list, so that doesn't really count. Cray has 20 machines on the list, a mere 4 per cent share of boxes, but they add up to 3.09 petaflops, giving Cray boxes a 13.7 per cent share of capacity. SGI has the same number of boxes as Cray, but half the capacity at 1.5 petaflops.
Dell has 14 machines plus two it shares with Sun, which has five of its own boxes on the list plus another it shares with NEC. Bull and Appro have a handful of machines on the list, and Fujitsu, Hitachi, and NEC have a couple each with a few more vendors having one or two boxes as well as two that are self-made. In terms of interconnect, there are 282 machines that are based on Gigabit Ethernet, another 151 that are based on various speeds of InfiniBand, three using SGI's NUMAlink interconnect, ten using Myrinet, and twenty using one speed or another of Cray's XT interconnect. There are four machines using IBM's proprietary SP switches, three still using Quadrics, and 42 boxes using some proprietary form of interconnect.
All told, the machines in the Top 500 list have 4.1 million processor cores. But not all cores are created equal in terms of performance or power consumption. There are 336 machines on the list based on Intel's quad-core Xeon processors (of the Clovertown, Harpertown, and new Nehalem EP Gainestown varieties), and if you include six Itanium machines and a bunch of other Xeons, there are 396 machines on the list that say Intel Inside. There are 13 boxes using dual-core Opterons, 28 using quad-core Opterons, and two using six-core Opterons, for a total of 43 boxes.
IBM's Power family of enterprise-class server chips account for 22 machines and its PowerPC chips (used in the BlueGene line) are in 29 machines. Another four machines (including the Roadrunner Los Alamos box) get the bulk of their processing from Power Cell chips. There's a smattering of other chips on the list: one Sparc box, one NEC vector box, and a custom machine using a chip called Grape-DR. The Power and Opteron processors have lost ground on the Top 500, while Intel Xeons have gained ground.
One of the interesting bits about the June 2009 list is that slightly more than half of the boxes (256 machines) were installed in 2009, and another 194 were installed in 2008. There may be an economic meltdown, but governments, universities, and academic labs are not being shy about spending dough on new-fangled supers. And that is pretty much worldwide. There are 299 machines on the list installed in the North America region, 145 installed in Europe, 49 in Asia, 6 in Oceania, and one in Central America. It is hard to believe that South America does not have one machine on the Top 500 list, but with Brazil being such an economic powerhouse these days, that will no doubt change soon.
To even get on the Top 500 list this time around, you have to build a machine with at least 17.1 teraflops of oomph. It was 12.6 teraflops back in November 2008. The aggregate capacity on the list was just under 17 petaflops in November, and now it has grown to 22.6 petaflops. ®