Top 500 supers: China rides GPUs to world domination
The People's Republic of Petaflops
In the Hopper
The fifth most-powerful super in the world based on the Linpack tests (at least the ones we know about) is a brand new box called Hopper. Installed at the US DOE's National Energy Research Scientific Computing center, Hopper is a Cray XE6 super using that new Gemini interconnect and twelve-core Opteron 6100 processors - no fancy schmancy GPU co-processors. (Well, at least not yet, anyway.) Hopper has 153,408 cores spinning at 2.1 GHz and delivers 1.05 petaflops of sustained performance with an efficiency of 82 per cent.
If it is not yet obvious, there is a bottleneck in getting parallel supercomputer nodes to talk through their networking stacks running on their x64 processors and out over the PCI-Express 2.0 bus. If Nvidia or AMD want to do something useful, embedding a baby x64 processor inside of a GPU co-processor along with a switchable 10 Gigabit Ethernet or 40 Gb/sec InfiniBand port would make a very interesting baby server node. Throw in cache coherence between the x64 and GPU processors and maybe getting to 50 petaflops won't seem like such a big deal.
The Bull Tera-100 super at the Commissariat a l'Energie Atomique in France, is based on Intel's Xeon 7500 high-end processors and Bull's bullx supercomputer blades and ranks sixth in the world. The machine uses QDR InfiniBand to lash the nodes together, and is rated at 1.05 petaflops. This machine does not have GPUs in it from either AMD or Nvidia, and neither does number eight, the Kraken XT5 super from Cray that is owned by the University of Tennessee and which is operated by DOE's Oak Ridge National Laboratory. Kraken delivers 831.7 teraflops of sustained Linpack performance, unchanged from when it came onto the list a year ago.
Number seven on the list, the Roadrunner Opteron blade system at Los Alamos National Laboratory (another DOE site) does use accelerators, but they are IBM's now defunct Cell co-processors, which are based on IBM's Power cores and which have eight vector math units per chip. While the Roadrunner machine demonstrated the viability of co-processors to push up to the petaflops. But Roadrunner is stalled at 1.04 petaflops, is probably not going to be upgraded, and is therefore uninteresting even if it will do lots of good work for the DOE. (If you consider designing nuclear weapons good work, of course.)
Number nine on the list is the BlueGene/P super, named Jugene, built by IBM for the Forschungszentrum Juelich in Germany, which debuted at number three at 825.5 teraflops on the June 2009 list and hasn't changed since then. Rounding out the top ten on the Top 500 list is the Cielo Cray XE6 at Los Alamos, a new box that is rated at 816.6 teraflops of sustained Linpack performance.