SGI previews UltraViolet Nehalem EX blade clusters
Taking orders now for next summer ships
SC09 Sometimes a server announcement is as defensive as it is offensive. So it is with the UltraViolet Altix UV big bad blades that Silicon Graphics showed off at SC09 in Portland, Oregon, Monday afternoon.
Even before it went into bankruptcy for the final time and was eaten by Rackable Systems, which subsequently took the SGI name and, more importantly, kept the UltraViolet shared memory supercomputer project alive, SGI was hinting that it was putting a lot of wood  behind the future Nehalem EX arrowheads (well, chips) from Intel, and only later did it proclaim its commitment  to the current Itanium dual-core chips and the Altix 4700 shared memory systems. The company would not commit to the Tukwila quad-core Itaniums earlier this year, and nothing has changed on that front. And very likely, unless some big HPC center pays for SGI to care, by this time next year SGI will be looking at the Altix 4700s as a legacy platform.
The reason why is simple: by combining the next-generation NUMAlink 5 interconnect with Intel's future eight-core Nehalem 7500 chips (also known by the code-name Beckton and more broadly as the Nehalem EX), SGI eliminates a barrier that has plagued it since 1998 when it chose the Itanium chip as its future processor: applications coded for x86 and now x64 processors do not run on its Itanium machines. Although SGI started selling Xeon-based Altix XE clusters in late 2006, these machines are just blade-style clusters; they're missing the NUMAlink interconnect that implements shared memory and that makes SGI different from other server makers and hence worth taking a risk on.
With the UltraViolet massively parallel machines, which will be sold as the Altix UV line, SGI will have the option of shaking off the Itanium albatross around its neck and focus on regular x64 chips from Intel - if that is what its customers are willing to pay for. SGI won't say it that way, of course, because it wants to continue to sell and support Altix 4700 boxes, and quite possibly sell Altix 4800s employing Tukwila Itaniums for those customers who don't want to port their applications back to x64 iron. (This seems the most likely course, particularly if Intel is looking for Itanium buddies and cuts SGI a good deal on chip prices.)
At the heart of the Altix UV machines is not the Intel Nehalem EX processor, but the NUMAlink 5 interconnect, which delivers 15GB/sec of interconnect bandwidth between the blades in the Altix UV cluster, and under 1 microsecond of latency on those links. That's twice as much bandwidth as the NUMAlink 4 interconnect used in the Altix 4700s. And because the Itanium processor had lots of memory bandwidth and SGI designed the memory controller used in the Altix 4700s (the Itaniums do not get on-chip memory controllers until the Tukwilas ship in the first quarter of next year), Geoffrey Noer, senior director of product marketing at SGI, says that the Altix 4700s did alright in the memory-bandwidth department - unlike the contrast between the Nehalem EP Xeon 5500 two-socket servers and their predecessors; the Xeon 5500s are offering three to four times the memory bandwidth, depending on the test.
SGI is not allowed to divulge the feeds and speeds of the Nehalem EX chips as part of its Altix UV coming-out party at SC09, but it is giving customers the basic shape of the machines, how they can scale, and what kind of performance they will offer.
Blades, with a twist
Like the Altix 4700 machines, the Altix UV boxes are based on a blade architecture. The Nehalem EX chips are designed to be used in servers with four sockets or more, and its related Boxboro chip can be used to make glueless eight-socket servers packing up to 64 cores in a single symmetric multiprocessing image. But there is no law that says you have to build a basic blade with four sockets.
In fact, to make room for the NUMAlink 5 interconnect, SGI is putting two Nehalem EX processors on a blade instead of four; some of the QuickPath Interconnect channels have to be used to talk to the NUMAlink 5 hub and some real estate on the system board has to be freed up for this. Each Nehalem EX socket has four Millbrook memory buffers, eight DDR3 RDIMMs, and four QPI links. Each socket has a maximum of 64GB of main memory using 8GB DIMMs. Two of the QPI links cross-couple the two processor sockets, one goes to the Boxboro chipset, which talks to local I/O slots and controls communication between the processors and memory on the blade. The last QPI link goes to the UltraViolet hub that implements the NUMAlink 5 interconnect. That hub has two FB-DIMM slots for storing NUMAlink directory information and four links out to the NUMAlink interconnect. With those four links going out to the NUMAlink router, which implements and 8x8 (paired node) 2D torus, SGI can create a global shared-memory system that spans 256 processor sockets and 16TB of main memory.
Rather than trying to make an Altix UV blade that can meet all possible I/O needs, SGI has made the I/O modular and implemented various options on riser cards. There are four different riser cards for the UV blades. One provides base I/O, while another has two hot-swap 2.5-inch drives. Another one offers two PCI-Express 2.0 peripheral slots (one x16 and one x8), while the final one offers a link to an external I/O chassis. This external I/O riser has two PCI Express 2.0 x16 links, which connect to an I/O expansion chassis with four slots. In all cases, SGI is supporting industry-standard PCI-Express peripheral cards.
MOE power to you
At 2,048 cores in a shared-memory system, the Altix UV has twice the processor cores as the top-end Altix 4700s. The NUMAlink 4 fabric could scale to 1,024 cores and 2TB in a single system image.
SGI says that the 16-core Altix UV blade can deliver up to 145 gigaflops of number crunching power, which makes a 256-socket box come in at around 18.56 teraflops. That all gets packed into four server racks, which is comprised of two 16-blade enclosures, and four NUMAlink routers on top of the racks.
That 16TB, by the way, is the upper limit of the Nehalem EX design, not one that SGI imposed. The current Itaniums can, in theory, address over 100TB of main memory.
According to Noer, this is by no means the upper limit of the Altix UV systems. To get to a 1-petaflop system, SGI can use the NUMAlink 5 interconnect to take four racks and their 256 sockets and link them in a fat tree group, and then use an 8x8 2D torus interconnect to lash together 32 racks together to make a 16,384-core machine. In such a machine, each 256-socket piece has to run its own copy of an operating system. Noer says that the current Altix UV NUMAlink 5 hub has an upper limit of twice this amount, clustering 32,768 cores together. "We are happy to work with any customer who needs more than 256 sockets," Noer says with a laugh.
The UltraViolet hub doesn't just handle scalability inside the Altix UV system. The hub chipset also includes offload engines for the Message Passing Interface (MPI) protocol commonly used in parallel supercomputer applications, and Noer says that the MPI Offload Engine (MOE) implemented in the hub chipset can handle MPI reductions two to three times faster than competitive clusters and massively parallel systems, and can handle MPI barriers up to 80 times faster. The point is, instead of this MPI processing hitting the CPUs, it is handled in the hub, leaving more cycles available to do real work.
The ProPack MPI stack created by SGI for its Altix machines has been tweaked to automatically exploit the MOEs, and applications using Intel or HP MPI stacks can access the MOE functions through an API layer.
The Altix UV systems will support Novell SUSE Linux or Red Hat Enterprise Linux, and will run out of the box without any modifications to the Linux code. Intel expects to get the Nehalem EX chips into the field in the first quarter, and Noer says that SGI will be able to get its intial Altix UV machines into the field in the second quarter, with volume shipments in the third quarter. The company is, of course, happy to take orders now. And in fact, it has already lined up four customers who will talk and a bunch of others who want to stay on the hush-hush. "The pipeline is very robust for Altix UV," says Noer.
It won't be hard for this box to sell better than the Altix 4700s. That much is for sure. ®