Nuke lab tests flashy HPC server cluster
Appro and Fusion-io plug into Hyperion
It's Flashturbation of a different kind. US nuke lab Lawrence Livermore National Laboratory has tapped longtime HPC cluster partner Appro International to custom-build a 100 TB flash-based storage system for its Hyperion x64 testbed cluster.
Project Hyperion was all the talk at the SC08 supercomputing show a year and a half ago, with Intel and Dell getting the big bits of the contract to build the testbed machine and Super Micro, QLogic, Cisco Systems, Mellanox, DataDirect Networks, LSI, Red Hat, and Sun Microsystems (now part of Oracle) all getting some pieces of the action.
The Hyperion super was not so much a machine as an ongoing procurement on the part of the three big nuke labs - LLNL, Los Alamos National Laboratory, and Sandia National Laboratories - to have the latest technologies available to put through the paces before deciding on including them in production machines. The initial machine had 1,152 server nodes, built by Dell and using Intel's quad-core Xeon processors, delivering 90 teraflops of number-crunching power, 11 TB of aggregate main memory, and 36 GB/sec of bandwidth out to storage subsystems.
The US Department of Energy paid $5m to build Hyperion, and the other contractors paid $5.5m (most likely in free equipment and services) to pay for the other half of the cluster. LLNL estimated the fair market value of the system at between $20m and $25m when it was completed in May 2009.
Now the Hyperion machine is getting a serious data I/O upgrade. LLNL has asked Appro to take an x64 server and create a flash disk appliance for the Hyperion machine using Fusio-io's ioMemory modules. Appro and Fusion-io have cooked up a flash appliance that crams two 640 GB ioDrive Duo (multi level cell, or MLC) flash modules into a single 1U x64-based server.
Two racks of these ioSANs, as Appro and Fusion-io are calling them, yield over 100 TB of storage capacity with an aggregate of 40 million I/O operations per second of disk bandwidth. The two vendors reckon it would take 43 racks of servers and a hell of a lot more electricity to get the same I/O from disk-based storage.
LLNL is no stranger to Appro, which has five of the company's Opteron-based clusters (with InfiniBand interconnect) at the nuke lab today that made the June 2010 Top 500 supercomputer rankings. However, don't get the wrong idea: LLNL is by no means an Appro only shop, and was the first place to get a BlueGene super from IBM. In fact, in 2012, LLNL is taking delivery of a massive 20 petaflops BlueGene box from IBM, nicknamed "Sequoia," and has a puppy BlueGene/P machine in the works named "Dawn" that is rated at 501 teraflops. The nuke labs never met a supercomputer they didn't like or a technology they could not afford.
This is not Appro's first foray into the use of flash memory in supercomputers, either. Last November, the San Diego Supercomputer Center got a $20m grant from the National Science Foundation to build its second flash-based parallel super. That machine, called "Gordon" of course, will be based on Intel's "Sandy Bridge" Xeons and will implement a shared memory systems based on ScaleMP's vSMP clustering technology. The server nodes in Gordon will have an aggregate of 245 teraflops of computing power (based on Appro's Xtreme-X servers, 64 TB of main memory implemented as a single memory space, 256 TB of flash memory, and 4 PB of disk drive capacity. The idea behind the Gordon machine is not to get as much flops in a box as possible, but to get CPU capacity and I/O capacity back into synch, thereby using CPU cycles more efficiently.
This is precisely the same aim the LLNL has by adding flash storage to the Project Hyperion testbed super. ®