SGI bags vanishing stimulus dough

Two supers and 100 teraflops for Los Alamos

The Obama stimulus money may be running out, but the US Department of Energy has just kicked a chunk of it over to Silicon Graphics to buy two new Altix XE parallel clusters.

The DOE already has a petaflops-class hybrid blade supercomputer: the IBM "Roadrunner" Opteron-Cell super that reported for classified nuke duty at Los Alamos National Laboratory last fall. The lab also runs all sorts of non-classified super-based work.

But more oomph is always welcome.

Like other DOE labs, Los Alamos has a little of everything. The Roadrunner super is, at the moment, the lab's flagship machine, but it doesn't have an upgrade path. IBM killed off its QS21 Cell-based blade earlier this year and stopped development on its own Cell chips last November — although Sony and Toshiba are apparently still using Cell derivatives in future products.

And as for the Opteron blade halves of the Roadrunner machine, there's no upgrade path there, either.

So far, IBM has done as little as possible to support Advanced Micro Devices' Opteron 4100 and 6100 processors, launching only one machine using the twelve-core Opteron 6100s. This machine, the System x3755 M3, is a perfectly good 2U, 48-core box, but it's not a follow-on to the LS41 blade server used in the Roadrunner. (There is an LS42 two-socket blade server available from IBM, but it only supports the earlier six-core Opteron 2400 chips.)

The word on the street is that disgraced IBM executive and former Systems and Technology Group general manager Robert Moffat, who lost his job and is going to prison because of an insider-trading scandal, killed off IBM's internal use of the Cell in blades for supercomputers. It's likely that Moffat, famous for his sharp knife inside of IBM's supply chain, was responsible for pulling back on development of AMD servers, too.

It's no wonder, then, that Los Alamos, which is perfectly happy with Roadrunner inasmuch as it proved the hybrid supercomputing concept, has been shopping around for different systems from different vendors.

SGI has not, as yet, convinced Los Alamos to be the showcase account for the Altix UV 1000 shared-memory supers, pushing clusters of the shared-memory NUMAlink 5 interconnect up into the petaflops range.

Some lab, somewhere, will get the funding and do it, sooner rather than later. For now, though, SGI will have to be happy with peddling two nearly identical Altix XE 1300 clusters, based on quad-core Xeon 5500 processors, to the nuke lab. Each machine has over 4,500 x64 cores, 14TB of unshared memory, and is rated at 50 teraflops. The first cluster has already been delivered.

The SGI machines will be plugged into the unclassified meta-cluster of machines, called the Turquoise Network, at Los Alamos and will be used by the ASC group that does nuclear weapons and stockpile work as well as DOE Office of Science, which uses the Los Alamos machines for climate, ocean, and sea ice–modeling simulations.

Los Alamos is shifting its cash pile away from IBM and toward Cray in the latest round of budget largesse. In April, the lab tapped Cray on the shoulder with $45m in funding to get its hands on one of the first XE6 Opteron-based supers with the Opteron 6100 blades and the new "Gemini" XE interconnect.

See what happens when you don't upgrade your blades, Big Blue?

Los Alamos has nicknamed the Cray machine "Cielo," and expects the super to be installed starting this quarter with additional nodes added in 2011 to bring it up to its petaflops performance level. ®

