Original URL: https://www.theregister.com/2010/11/03/llnl_appro_edge_viz_cluster/

Nuke lab gets visual with GPU cluster

The yummy mummy of workstations

By Timothy Prickett Morgan

Posted in Channel, 3rd November 2010 05:00 GMT

Lawrence Livermore National Laboratory, one of the big nuke labs run by the US Department of Energy that is responsible for managing the American nuclear weapons stockpile and helping to design future nuclear weapons, has what amounts to a brand new frontal lobe for its myriad parallel supercomputers.

The new machine, called Edge, will be comprised of 216 two-socket server nodes from server maker Appro International. Each of the HyperPower machines is based on the GreenBlade blade servers, which feature two six-core Xeon 5600 processors. A total of 20 TB of main memory is plugged into the server nodes. All but eight of the head nodes in the cluster (that's 208 machines) are equipped with a single Nvidia Tesla M2050 fanless GPU co-processor.

That works out to 107 aggregate teraflops of number-crunching power on the Tesla 20 GPUs, plus another 29 teraflops in the CPU side of the Edge machine, which has 2,592 cores. The GPUs plug into the compute nodes through PCI-Express links and the nodes are linked to each other through a quad-data-rate InfiniBand network.

Yes, this would make one hell of a machine upon which to play Crysis, although you'd have to either run it in the WINE runtime environment on top of the Tri-Labs Operating System (a variant of Red Hat Enterprise Linux cooked up by the nuke labs) or sneak Windows HPC Server onto the boxes.

But that is besides the point and a waste of tax dollars. (Well, not really.) Edge is not used to crunch numbers like the ASCI Purple, BlueGene/L, and Dawn Power-based supers running at the lab today or the future 20-pteaflops Sequoia BlueGene machine that LLNL requisitioned from Big Blue in February 2009 and which is set to go live in 2012. And the Edge machine is not, despite being packed to the gills with Nvidia Tesla GPU co-processors, being used to drive a massive display wall. Although, according to Becky Springmeyer, computational systems and software environment lead for the Advanced Simulation and Computing program at LLNL, the hardware certainly could be used to drive such a wall. LLNL already has those.

Double shift super

The Edge CPU-GPU hybrid cluster will have two different jobs at LLNL. First, it will be used to chew through the massive data sets generated by the big supers at the nuke lab to try to figure out what parts of the data sets to turn into visual representations so human beings can turn their eyes and frontal lobes on it and try to make some sense of the data.

Appro LLNL Edge Cluster

The LLNL Edge cluster: the mother of all workstations

The Edge cluster, says Springmeyer, will also be used to help LLNL coders see how well or poorly their applications take to the CUDA programming environment for Tesla GPU co-processors. "What's really important is to see how these applications will be port to these new machines," she says.

LLNL already has a data analysis and visualization cluster called Graph that is used for classified government projects. The Graph cluster, says Springmeyer, is much larger in terms of server node and CPU core count, but it does not have GPUs to try to goose its performance with cheap flops. On the outside of the spook firewall, LLNL runs another viz cluster that has about half as much oomph as the Edge box, which researchers working on various projects can stand in line and get access to. The big supers like Sequoia are used for public projects as they are burned in, and then they disappear behind the wall of secrecy. Edge will stay on this side of the wall.

LLNL did not have the precise price for the machine, but said it was on the order of $4m to $5m. ®