Original URL: https://www.theregister.com/2010/05/05/sgi_altix_8400_supers/

SGI chills new Altix ICE supers

Adding Opterons alongside Xeons

By Timothy Prickett Morgan

Posted in Channel, 5th May 2010 20:21 GMT

As promised, Silicon Graphics is today shipping the Switzerland version of its Altix ICE blade server clusters, supporting either Intel or Advanced Micro Devices processors - or a mix of the two, if that makes you happy.

Being a fairly low-volume server player, once the old Silicon Graphics tossed away its own MIPS chips and Irix Unix variant in favor of Intel's Itanium processors and Novell's SUSE Linux, the company tightly aligned itself with Intel. Most would say too tightly, since the Itanium road had huge potholes that busted more than a few server axles along the way. But because SGI did not have the sway of HP, Dell, or IBM, or the budget to develop both a Xeon and Opteron product line, when it finally - some would say belatedly - put out the Altix ICE line of blade clusters, it only shipped machines using Intel chips.

But flash forward a few years, and Rackable Systems has eaten SGI and taken its name, and Rackable was always perfectly happy to ship either kind of x64 chip, and will even do VIA chips if that floats your flops or apps.

And that's why the fourth generation of the blade clusters, the Altix ICE 8400s, are available with either Xeon 5600 or Opteron 6100 processors. They don't sport the global shared memory of the Altix 4700 Itanium-based machines or the Altix UV 100 and 1000 Xeon 7500-based machines, but are just plain old clusters that SGI calls distributed-memory systems to distinguish them from its shared-memory systems.

From the outside, the Altix 8400s look a lot like the Altix 8200s they replace. But there are lots of changes underneath the metal skins of the chassis that go beyond changing up servers and chipsets. The Altix 8400s use a 10U chassis that holds 16 compute blades, just like the Altix 8200s did, and they come in an LX flavor with a single-plane InfiniBand network or an EX flavor that has two planes.

The 8400LX chassis has two quad data rate (40Gb/sec) InfiniBand switches per enclosure, and the single plane means that the server and storage traffic are all parked on this network. The 8400EX chassis has room for four InfiniBand QDR switches and has two planes. The new 8400 chassis has a new faster midplane that links the blades to the switches, which means that the extra bandwidth that comes with QDR InfiniBand can be used.

According to Bill Mannel, vice president of product marketing at SGI, some customers use the redundancy in the EX chassis for failover, some use them to separate MPI traffic from storage traffic, and still others just use it to double up the bandwidth and mix all the traffic on a fatter network. The Altix ICE machines have a separate Gigabit Ethernet network implemented on their system boards that is used for system administration, so admins don't gum up the HPC workloads or vice versa.

The Altix 8400 blades mount side-by-side, two abreast and horizontally, in the LX or EX chassis. The Intel blades can be equipped with quad-core Xeon 5500s or six-core Xeon 5600s; the blade has a dozen memory slots, supporting up to 96GB using 8GB memory sticks. Unlike many other blade servers, SGI can support top-bin 130 watt Xeon 5600 parts in the Altix 8400s. Up to 768 cores can be put into a rack with four of the 16-blade chassis, for just over 10 teraflops of aggregate floating point performance.

The AMD blades are based on the Opteron 6100s, which come in versions with eight or twelve cores. The AMD blade has room for 16 memory slots, or 128GB per blade using 8GB sticks. The Xeon blades are available now, but the AMD blades will not be generally available until the third quarter.

But when the AMD blades are available, you will be able to jam-pack 1,536 cores into a single Altix 8400 chassis. Forget about the top-bin 105 watt Opteron 6176 SE part, because when you convert to Intel thermal design point (TDP) metrics, it is probably something more like 137 watts. So take the 2.2GHz Opteron 6174 part.

All the Opteron cores can do two floating-point calculations (peak) per clock, so with a dozen cores you are looking at something like 105.6 gigaflops per socket - with 16 blades per chassis and two sockets per blade, that's 3.38 teraflops per chassis or 13.5 teraflops per rack. Moving up to the Opteron 6176 SE only boosts this to 14.1 teraflops per rack, a gain of 4.5 per cent in number-crunching power, but the heat coming off the chip will go up by a third and the chip price rises by a fifth.

It isn't worth it, unless you are a bank and every nanosecond counts and is worth millions of dollars.

Mannel says that SGI has no plans to support the Xeon 7500, which has a lot more oomph but runs hotter and is a lot more expensive, in the Altix 8400 blades. Nor does it have any plans at this time to support the impending "Lisbon" Opteron 4100 processors, which will have four or six cores per socket, higher clock speeds, and be a lot less expensive as well as running cooler. SGI may be missing out on an opportunity by not doing the Opteron 4100s.

Whether you pick the Xeon or Opteron blades, you can choose from one of three Mellanox ConnectX-2 InfiniBand host channel adapters, which snap into mezzanine slots on the Altix 8400 blades. The first is a single-port HCA, which is the cheapest model and which is used in single-plane InfiniBand networks. Mannel says that lots of competitors in the server racket have dual-port HCAs on their blades, even when they are implementing single-plane topologies, and this is just wasted money.

The second option is a dual-port HCA from Mellanox that is used for dual-plane configurations. But the trouble is that this HCA has only one PCI Express lane, and that becomes a choke point for applications that have a lot of I/O. And so SGI is also peddling a special HCA variant that puts two single-port HCAs on a single mezzanine card, each with its own PCI Express lane. This provides twice the InfiniBand bandwidth coming off the blade.

The InfiniBand switches in both the Altix 8400 LX and EX chassis support fat tree, all-to-all, hypercube, and enhanced hypercube topologies. The hypercube setups are better than fat tree configurations because you can add server nodes to the cluster without having to rewire all the connections. This is a key differentiator for HPC shops that are running jobs all the time and do not want to take the system down.

The hypercube topology and the live integration it engenders has been used with the "Pleiades" massively parallel machine at NASA Ames, which reckons that it was able to keep about 2 million CPU-hours in production in late December last year when a 512-core rack was added to the system. Jobs typically take five days to run on the super, but you have to stage work to shut down well ahead of when the system is to be upgraded, and that means many nodes are idle for days as the queue empties.

Mannel says that hypercube and enhanced hypercube topologies are well suited to machines with a large number of nodes, and that all-to-all topologies provide the highest bandwidth and lowest latency on Altix 8400 clusters with up to 128 nodes. Fat tree topologies have the highest networking costs and require external switches, but are well-suited for MPI algorithms that need all-to-all communications on a much larger scale than 128 nodes.

The Altix blade setups don't just include the compute nodes, but also a hierarchy of other nodes, including a service node, a rack leader node, and a system administrative controller. Customers can also swap in Xeon 7500-based Altix UV 10 four-socket "fat nodes" and a variety of rack appliances stuffed with Nvidia and AMD GPUs. These all plug into the InfiniBand network as peers.

The Altix 8400s support the same SGI ProPack extensions for Novell SUSE Linux Enterprise Server and Red Hat Enterprise Linux. SGI's own Tempo tool is used for systems management, Altair's PBS Pro is used for workload management, and Intel's MPI runtimes are used for message passing.

SGI, as is sometimes its bad habit, did not provide pricing for the Altix 8400s. But Mannel says that like-for-like configurations of the Altix 8200 machines (using 20Gb/sec InfiniBand and a slower midplane) would cost more money and that the Altix 8400s are "aggressively priced" compared to alternatives in the HPC racket. Mannel says that SGI can also boast that the Altix 8400s can scale up to 32,000 nodes and have three times the switch bandwidth per node compared to other QDR InfiniBand systems. ®