What do you call a flash-based super?
Gordon, of course
Boutique supercomputer maker Appro International has won a deal at the San Diego Supercomputer Center to create a next-generation parallel supercomputer that has lots of flash memory and software to create a virtual shared memory infrastructure that spans the server nodes in the cluster.
The flash-based server, called Gordon naturally, is being paid for through a $20m grant from the National Science Foundation, and is actually the second HPC cluster at SDSC that featured flash-based storage to speed up computational work.
The first one, called Dash, became operational in September. It's a mere 5.2 teraflops and is based on the combination of Appro GreenBlade blade servers equipped with Intel Xeon 5500 Nehalem EP processors and Intel SATA-style flash drives. This cluster has only 68 nodes, each with 48GB of main memory and connected to each other with an InfiniBand network; four of the nodes are set up as I/O nodes, each with 1TB of flash drives for a total of 4TB of flash. The whole shebang is equipped with ScaleMP's vSMP Foundation virtual symmetric multiprocessing software, which gives programs running on Dash a single 768GB address space to play in across 16 nodes plus one flash-based node for I/O. Four of these "supernodes," as SDSC calls them, are linked together to make the Dash system, which is currently parked on the TeraGrid shared computing grid.
Gordon takes these blade-based supernodes and puts them on steroids, then lashes them all together with a higher-bandwidth InfiniBand network to create a cluster that weighs in at 245 teraflops of aggregate floating-point performance - but hopefully this machine has a sustained performance that comes a lot closer to hitting that peak performance thanks to the flash and vSMP software.
The Gordon cluster, says Appro, will be based on "the latest Intel Xeon processors available in 2011," when the box is expected to be fully configured and deployed. That should mean Sandy Bridge Xeon processors.
SDSC has chosen the high-end Xtreme-X platform from Appro as the foundation for Gordon, and the resulting machine will have 64TB of total main memory, 256TB of flash memory, and 4PB of disk capacity. This is arguably not a particularly powerful machine by some measures, but the 245 teraflops is not the point of the Gordon design, according to Allan Snavely, associate director at the SDSC and co-principal investigator for this innovative system. Speeding up I/O is.
"Moving a physical disk-head to accomplish random I/O is so last-century," says Snavely. "Indeed, Charles Babbage designed a computer based on moving mechanical parts almost two centuries ago. With respect to I/O, it's time to stop trying to move protons and just move electrons. With the aid of flash solid-state drives, this system should do latency-bound file reads ten times faster and more efficiently than anything done today."
The goal of the Gordon design is, in fact, to get the ratio of terabytes of addressable memory to teraflops of peak performance closer to a 1-to-1. In a typical cluster today, the ratio is more like 1-to-10, and that is out of whack for HPC workloads that might not need to do a lot of math compared to the amount of data they have to shuffle around.
With the Gordon design, SDSC is building a system that has 32 supernodes using what is presumed to be the Sandy Bridge Xeons due in early 2011. With an expected 4GHz clock speed, depending on the core count (which ranges from four to eight), the floating-point performance of a Sandy Bridge processor could come in at between 64 and 256 gigaflops. SDSC is not saying how big each supernode is in terms of processor, socket, or server count, but says that each node will have 240 gigaflops of performance and 64GB of main memory, and that there will be 32 nodes in a supernode. A supernode will have 7.7 teraflops of performance, 10TB of memory (2TB of main memory and 8TB of flash memory). If you do the math, it looks like each server node will have two Sandy Bridge Xeons running at 3.75GHz, plus access to that 8TB of flash capacity, stored on two server nodes, each with 4TB of Intel flash. The 32 server nodes in the supernodes will be glued together using vSMP.
Appro says that it is connecting the supernodes together and to each other with a 16Gb/sec bi-directional InfiniBand network, a speed that is funky considering that current InfiniBand runs at 40Gb/sec.
Gordon is aimed at what Michael Norman, the other principal investigator on the super's design, calls a growing list of critical data-intensive problems. SDSC hopes to use the box to help analyze individual genes to tailor drugs specifically for patients, to predict the effects of earthquakes on buildings and roads, and to perform climate modeling. ®