Appro notches up another Los Alamos super deal
Riding a Mustang to $10m
Supercomputer maker Appro International has lassoed itself another supercomputer win at the US Department of Energy's Los Alamos National Laboratory.
Los Alamos would no doubt have preferred that Advanced Micro Devices had gotten more of its 16-core "Interlagos" Opteron 6200 processors out of its GlobalFoundries wafer-baking partner and that 56Gb/sec FDR InfiniBand was also a little more mature. But the "Mustang" supercomputer that Los Alamos is taking from Appro is no slouch and offers pretty good bang for the buck compared to the alternatives for machines that scale from hundreds of teraflops up to a petaflops or so. The Mustang will more than double up the lab's unclassified computing capacity. The $10m machine weighs in at $28,328 per teraflops – about half the price per teraflops of a Cray XE6 with its snazzier and lower-latency "Gemini" interconnect.
The Los Alamos "Mustang" super built by Appro
The Mustang super is based on Appro's Xtreme-X EL Series of machines, which are based on two-socket Opteron 6100-class processors and which use a single-rail QDR InfiniBand network to cross-connect the blade server nodes in the machine in a fat tree configuration.
Each rack of Xtreme-X EL Series machines has 64 server nodes for a total of 1,536 cores. In the case of the Mustang machines, the nodes are configured with a mere 64GB of main memory. Los Alamos had the option of going with the MR Series, which has a dual-rail fat tree setup based on QDR InfiniBand. The HE Series of Xtreme-X machines have a dual-rail 3D torus topology, which makes it easier to extend the cluster with more nodes. You don't have to use the 3D interconnect to build a fat Xtreme-X machine, and in fact Los Alamos is putting together 25 racks of these Xtreme-X boxes to build a 1,600-node system with 38,400 cores, 102.4TB of aggregate main memory, and a peak theoretical performance of 353 teraflops.
Appro pre-assembling and testing the Mustang super in its factory
If AMD had already fully ramped up the Interlagos processors, the Mustang super would probably weigh in at around 475 teraflops of floating point oomph, depending on the clock speeds that will be used in the Mustang machine's Opteron 6100 processors and the possible speed options in the Opteron 6200 lineup. The good news is that both the current and impending Opteron processors are compatible with the same G34 socket, so Los Alamos can upgrade the CPUs at will if it wants that extra performance.
Los Alamos has five other clusters that are part of its non-classified computing capacity, which is collectively called Turquoise and which have nearly 300 teraflops of aggregate computing capacity. This includes a baby version of the Opteron-Cell hybrid called "Roadrunner" which was built for Los Alamos by IBM. Roadrunner was used for open science as it was being built and tested, just to see how applications would scale on a 1.1 petaflops hybrid supercomputer.
But in October 2009, Roadrunner was taken behind the wall of secrecy  to do its top-secret nuclear missile management and design work. Some researchers had apps running on this box, and to help them keep going, Los Alamos maintains a 153 teraflops baby Roadrunner, named Cerillos, for this unclassified work. This Turquoise network has nearly double the bandwidth to storage arrays thanks to a recent upgrade as well as anywhere from 20 to 100 times the bandwidth going to outside networks after another recent upgrade. In November, Los Alamos plans to kick in another 480TB of storage for the Turquoise network to store data upon.
As part of the expanded Turquoise network, Mustang will be used to do modeling and simulation for oceans, wildfires, plasmas, advanced materials, and nuclear reactions.
Los Alamos has been buying supercomputers from Appro since 2005, and the upstart HPC system supplier has also landed big contracts at Lawrence Livermore and Sandia National Laboratories, two of the handful of nuke-super centers controlled by the US Department of Energy. These three are often called Tri-Labs, and they have created their own variant of Red Hat's Enterprise Linux distro to run on their various massively parallel machines.
Back in June , Appro inked a two-phase deal that will see a 6 petaflops Xtreme-X machine based on Intel's "Sandy Bridge-EP" Xeon E5 processors being spread across the three labs. This machine – well, it is really three clusters – will use a dual-rail QDR InfiniBand interconnect between the two-socket blade nodes; 3 petaflops of iron goes into the labs in the first quarter of 2012 and by the third quarter, all 6 petaflops should be up and running. There's a chance that some of the nodes will get Nvidia GPU coprocessors to goose their performance. That procurement for Tri-Labs came in at under $15,000 per teraflops, which just goes to show you that if you want a good price for anything in this world, you have to buy in bulk.
In September, Appro inked a deal  for an 800 teraflops hybrid Intel Xeon E5-Nvidia M2090 GPU supercomputer with the Tsukuba University in Japan.
Clearly, Appro is not only learning how to build bigger machines, but also how to close bigger deals. ®