Italian 'Eurora' supercomputer pushes the green envelope
Besting Cray and IBM in the energy efficiency game
The "Eurora" supercomputer that was just fired up in Italy may not be large, but it has taken the lead in energy efficiency over designs from big HPC vendors like Cray and IBM.
The new machine was built by Eurotech, a server maker with HPC expertise that is based on Amaro, Italy, in conjunction with graphics chip and GPU coprocessor maker Nvidia. Based on initial tests Linpack Fortran benchmark tests, it would be more energy efficient than either IBM's massively parallel Power-based BlueGene/Q machine or the hybrid ceepie-geepie XK7 machine from Cray.
Eurotech sells Intel Xeon and Advanced Micro Devices Opteron servers, and in the case of Eurora, the company is matching its Aurora Tigon  – as in part tiger, part lion – water-cooled servers with a special variant of Nvidia's "Kepler2" Telsa K20X GPU coprocessors , which were announced last November. The machine is being installed at the Cineca supercomputer center in Bologna, which is a member of the Partnership for Advanced Computing in Europe (PRACE) effort in the European Union to push toward exascale computing.
The Aurora Tigon servers are a blade design based on Intel's Xeon E5-2600 processors. Instead of heat sinks, the processors have flat metal plates where water blocks can be attached to take heat away with water that is at tap temperature. Eurotech calls this hot water cooling, but it is really warm water cooling.
Nvidia is shipping Eurotech a special version of the Tesla K20X GPU coprocessor that doesn't have a heat sink but a metal plate as well, and Eurotech has designed its own system board so it can have two processor sockets, main memory, and two GPU coprocessors all on the same thin board.
This is a similar approach to what Russian supercomputer maker T-Platforms did back in the summer of 2011  when it was building a blade server for Moscow State University, although in that case the blade had two low-voltage Xeon L5630 processors and two X2070 embedded GPU coprocessors. Water blocks go on all four computing elements to take the heat away rapidly and efficiently.
Sumit Gupta, general manager of the Tesla Accelerated Computing business unit at Nvidia, says that the custom K20X part does not have a name and not just anybody can get these parts. The trick for Eurora that has driven up efficiency, he tells El Reg, is that Cineca has figured out how to take one eight-core Xeon E5 processor on the Eurora blade and have it drive both GPU coprocessors, thus leaving the other CPUs in the machine capable of doing other calculations. On the Linpack run done by Cineca, only one of the two CPUs on each blade was used to drive the CPUs.
This begs the question as to why there are two sockets on any server if you can drive two GPUs. It comes down to legacy software support. Even if you have new-fangled apps that can run in ceepie-geepie mode, that doesn't mean all of your applications have been ported and you still need to run them in CPU-only mode.
The Eurora supercomputer built by Eurotech and Nvidia
This very modest yet highly efficient machine has a total of 64 compute nodes, each with two Xeon E5 processors and two of the custom K20X GPUs. This machine had a bottle of bubbly broken on it earlier in the week at Cineca, and had been tested to run the Linpack test (on which the Top500 and Green500 supercomputer rankings are based) at a sustained 110 teraflops of number-crunching performance. This was accomplished consuming 34.7 kilowatts, which yields a very impressive 3,150 megaflops per watt.
If you look at the November 2012 Green500 supercomputer rankings , you will see that a hybrid Xeon E5-Xeon Phi cluster called "Beacon" based on Cray/Appro's GreenBlade delivers 2,499 megaflops per watt and currently is the most energy-efficient supercomputer in the world.
The top-end "Titan" XK7 machine at Oak Ridge National Laboratory, which burns 8.21 megawatts and delivers 17.59 petaflops sustained performance on Linpack, yields 2,143 megaflops per watt. This is the most powerful (in terms of oomph) machine in the world.
The former champ of energy efficiency, IBM's BlueGene/Q, is coming in at 2,102 megaflops per watt on the "JuQueen" super at Forschungszentrum Juelich and a little less than that on the four times as large (and more powerful) on the "Sequoia" machine at Lawrence Livermore National Laboratory.
It doesn't take a supercomputer to see that 3,150 megaflops per watt is a big leap – about 47 per cent more power efficiency than the Titan machine.
Of course, Eurora is not a full supercomputer with only 110 teraflops sustained. This is a perfectly respectable performance for a midrange box, though, and a fully loaded Aurora Tigon rack can hold 256 Tesla K20X GPU coprocessors and 256 Xeon E5 processors for a combined 350 teraflops with one of those CPUs deactivated.
And, says Eurotech and Nvidia, you could build a 3.1 petaflops system with just nine racks. It used to take several hundred racks and several hundred million dollars (at least) to do that. Now, you can do it on the cheap. How much, Cineca and Eurotech are not saying.
What they will say is that this hybrid approach with warm water cooling can cut electric bills by around 50 per cent compared to using chillers to cool the air in a standard data center, and that TCO compared to plain vanilla x86 clusters is anywhere from 30 to 50 per cent better.
This latter comparison pits an 1,800-node cluster using cold air against a similar sized cluster using water blocks and warm water cooling; it does not take into account a shift of number crunching from CPUs to GPUs. ®