Football U goes to 3Leaf for HPC
Hut one, hut two, MPI
Florida State University, like most state schools in the US with a storied (American) football program, also has a respectable comp sci department that is not afraid to spend a little cash on a new technology when it comes along. And that's how the Seminoles have ended up being one of the first customers of upstart server maker 3Leaf Systems.
Back in November 2009, after nearly six years of development, 3Leaf brought to market a shared memory clustered system called the Dynamic Data Center Server line. The DDC Server machines take Opteron-based SMP servers and a homegrown chipset called Voyager to lash together and virtualize the cores, memory, and I/O slots in the machines in the cluster.
When equipped with 3Leaf's own virtualization layer, the DDC machine can be made to look and feel like a giant SMP server or a regular HPC cluster based on the Message Passing Interface (MPI) cluster. Which is one of the reasons why the DDC machines may be the vanguard of future systems that will allow users to toggle from loose coupling with memory unshared to tighter coupling with shared memory being implemented over super-fast networks. This way, as workloads change, the hardware doesn't have to. It just reconfigures itself on the fly.
FSU's Department of Scientific Computing has a bunch of systems that support processing related to molecular biophysics, evolutionary biology, network modeling, and Monte Carlo algorithm development. The key HPC system at the university has 256 of Dell's PowerEdge SC1435 blade servers, which are two-socket Opteron-based blade servers from a few generations ago, plus eight Dell PowerEdge 6950 servers that act as head nodes for the cluster. The nodes in the cluster are glued together with a 288-port 20 Gb/sec InfiniBand switch from Cisco Systems, and two Cisco 6500 switches link to a 156 TB storage array from Panasas. This machine is rated t a relatively modest 8 teraflops, and it was clearly due for some kind of boost.
According to Bob Quinn, the founder, chairman, and chief technology officer at 3Leaf, FSU is starting out relatively small, with a twelve-node DDC Server that is based on two-socket systems crafted by motherboard and whitebox server maker Super Micro (which has created a special motherboard with room for the Voyager ASIC and which is also 3Leaf's hardware manufacturing partner). The system uses the 2.6 GHz Opteron 8400 processors from Advanced Micro Devices (these are the six-core chips known by the code-name "Istanbul"), and it has 144 cores in total. The dozen nodes are linked by a 40 Gb/sec InfiniBand switch, and the cluster has 576 GB of shared coherent memory.
The nodes have a modest 6 TB of local storage. This box, says Quinn, sells for $206,000. Assuming the Istanbuls do a peak of four floating point operations per clock cycle, then this machine should be rated at around 1.5 teraflops. By modern standards, this is not a lot of number-crunching power. But Florida State is not the University of Illinois, either, which is getting the 1 petaflops "Blue Waters" super made by IBM, the fastest machine currently installed at a university in the United States. (The Seminoles might be able to pull even with the Fighting Illini on the gridiron, even though they have been bested on the compute grid).
The DDC Server lineup can scale up to 16 server nodes in a shared memory configuration, so FSU has a little room to grow. And Quinn says that it will be able to support the impending twelve-core "Magny-Cours" Opteron 6100s in the DDC Servers by the third quarter of 2010.
Scalability was not really top-of mind for FSU, according to Quinn. What mattered to them, and what matters to the companies that 3Leaf is going to chase with its current and future (and much more scalable) machines, is the flexibility of the DDC Server systems software, which allows for nodes to be partitioned and repartitioned in a matter of seconds and configured with new software stacks in 30 seconds or so.
The MPI protocol can run atop the shared memory architecture without any changes to the code, too, so shifting between shared memory and distributed applications is no big deal. Up until now, customers had to decide what way they wanted to go and buy a machine to do one or the other.
While the initial DDC Servers are interesting, it looks like the systems will be far more scalable with the "Sandy Bridge" Xeon servers due from Intel in 2011. Quinn says that since the middle of 2008, 3Leaf has been working with Intel on modifying the Voyager ASIC so it can make use of the second generation of Intel's QuickPath Interconnect. The DDC Servers using the Sandy Bridge Xeons will be able to support up to 64 TB of shared memory, and "a large number" of server nodes and cores, according to Quinn. He would not specify how many nodes. But the word on the street is that it will scale to 32 nodes.
The Sandy Bridge Xeons aimed at two-socket servers are expected to have at least eight cores, and possibly more in machines where thermals are not as big an issue. Assume for Sandy Bridge-EP class machines Intel can push it up to a dozen cores, which they have to do to compete with Magny-Cours. So a single 3Leaf machine in the Sandy Bridge era should be able to house 768 cores using two-socket boxes.
How is 64 TB is going to be crammed into 32 nodes, you ask? Well, brace yourself, because a terabyte per node is going to be normal with the server chips that Intel and AMD are cooking up for this year. So doubling that to 2 TB per node in a year is so, well, that's just Moore's Law. ®
shared memory is the important part
The important part of this system is the shared memory, allowing for SSI (single system image) operation. Certain computations parallelize fine, but each portion requires access to the full working set, and possibly large amounts of data passed between working threads. This is VERY slow with message passing, but trivial with shared memory.
there's been a dearth of systems recently that have shared memory past 1 motherboard,
Tbyte of RAM
"...a terabyte per node is going to be normal with the server chips that Intel and AMD are cooking up for this year."
Very interesting - a reference here would be great. I know Nehalem EX is coming, but not sure they can take that much RAM.