Original URL: http://www.theregister.co.uk/2009/06/23/sun_infiniband_hpc/

Sun buffs InfiniBand for Constellation supers

Over two petaflops sold

By Timothy Prickett Morgan

Posted in Servers, 23rd June 2009 22:55 GMT

The second-generation InfiniBand switch that Sun Microsystems has been showing off since last November made its debut this morning at the International Supercomputing Conference in Hamburg, Germany. The new switch - coupled to new servers based on Intel's "Nehalem EP" Xeon 5500 processors as well as existing quad-core "Shanghai" Opterons (and soon to be six-core variants) - is the core of the upgraded Constellation HPC clusters that Sun has been pushing for two years as a means of getting back into the supercomputing space.

The new "Project M2" 648-port modular quad data rate InfiniBand switch - as well as two low-end fixed-port switches that run their ports at the same 40 Gb/sec speed - are all based on new InfiniBand protocol chips made by Mellanox. (That vendor launched its own line of switches that span up to the same 648 ports running at QDR speeds yesterday ahead of ISC '09). Sun was previewing its QDR InfiniBand switches as well as its Nehalem EP blade servers and some integrated storage (with solid state drives) aimed at HPC customers, and now, it is ready to start shipping boxes.

According to Michael Brown, marketing manager for HPC at Sun, the company has sold over 2 petaflops of Constellation machinery and about half of that is based on the new Nehalem machines that were announced two months ago and the new QDR InfiniBand switches. "That's a pretty big chunk of business," says Brown with a certain amount of satisfaction.

To be fair, the Constellation boxes have been a bright spot for Sun, which is finally getting some play on the Top 500 list of supercomputers. About a quarter of the petaflops that Sun has shipped or that are on order for Constellation boxes come from one machine, the "Ranger" Constellation box at the University of Texas, with a few other big deals contributing tens of teraflops on top of that. Constellation needs a lot more sales, as do Sun's generic rack and blade servers for customers who don't want to adopt InfiniBand and who might prefer cheap Gigabit Ethernet or alternative 10 Gigabit Ethernet switching.

A single Constellation rack has 48 full-height or 96 half-height blade servers, plus the switching and storage, for a maximum of 768 cores using Nehalem EP Xeon or Shanghai Opteron processors. Various labs that are thinking well below the petaflops performance level that IBM, Cray, Silicon Graphics, and Sun are chasing (and to a lesser extent, so are Dell and Hewlett-Packard) and are looking at buying Constellation machines that span only one or two racks. The adoption of the six-core Istanbul Opterons sometime in the next quarter in the X6240 and X6440 blade servers, which will only require a BIOS update on the blades, certainly won't hurt sales of smaller racks, allowing customers to pack 1,152 cores in a rack.

Brown says that Sun's HPC business is more than just Constellation boxes, but was not at liberty to say what percentage of Sun's HPC sales come from outside of Constellation setups. As an example, he says that the University of North Carolina at Chapel Hill has bought seventeen of Sun's X4600 Opteron servers (which each have 16 cores) plus some storage and its Grid Engine gridding software to make a baby cluster. This setup at UNC includes 45 Sun workstations as well as a mix of storage, and it harkens back to the kinds of deals Sun used to do all the time back in the 1990s, deals that made it a name in academic computing right beside Digital Equipment.

And on the left coast...

The University of California at San Diego is also using 32-core X4600 machines as the basis of a cluster that has 512 GB of main memory per node, something Sun can't do on its Xeon or Opteron blades. There are some Sparc-based clusters here and there too, particularly in financial services, which are used to run economic simulations as part of trading systems.

The official name of the Project M2 QDR switch is the Datacenter InfiniBand Switch 648, and it has a starting list price of $70,495. The switch fits in an 11U rack chassis and uses 12x consolidation cables to plug into the 4x InfiniBand ports so you only need 216 cables. The chassis can be equipped with up to nine 72-port line cards and up to nine vertical fabric card slots, for a total of 41 Tb/sec of aggregate bi-directional bandwidth.

Up to eight of these switches can be linked together to create an InfiniBand fabric that can span 5,184 ports. With each server node presumably having one port and two sockets with either four or six cores, we're talking about an HPC cluster that can span from 41,472 or 62,208 cores. This is a very large system, on the order of 400 to 500 teraflops, depending on the processor clock speeds.

The top-end 648-port InfiniBand switch is designed and manufactured by Sun, according to Brown, as are the InfiniBand Switch 72 and InfiniBand Switch 36 fixed-port switches that Sun is also showing off at ISC '09 today. These are based on the latest Mellanox chips and feature QDR InfiniBand speeds as well. (The exact specs and prices for these two switches were not available at press time).

At ISC '09, Sun is previewing a new flash disk array with 2 TB of capacity that comes in a 1U chassis and that Sun says has enough I/O per second data bandwidth to replace around 3,000 disk drives, but does so by only burning around 300 watts. Sun is also previewing a new Storage 7000 array to HPC customers that will span up to 1.5 PB in capacity and will have full redundancy - multiple head nodes, multiple interconnects, and such - built in. No word on when these two will ship.

On the HPC software front, Sun is rolling out Luster 1.8.0, which has been tweaked so it understands the flash memory Sun has sprinkled into its open storage arrays. The new Luster clustered file system also has a number of nips and tucks to boost performance and improve usability, including a new adaptive timeout feature and version-based recovery of data stored on the file system.

Sun is also announcing its HPC Software Linux Edition 2.0 software stack, which runs on Red Hat, CentOS, or SUSE Linux. Exactly how this bundle of tools is different from the 1.2 release of the HPCstack from Sun is not clear, since the feeds and speeds are not up yet for it. (You can see all the details about the 1.2 release here).

Sun is also pushing its Grid Engine grid software to Release 6.2 Update 3, which adds the ability to bring compute capacity on Amazon's EC2 compute cloud as well as other internal clouds that are compatible with EC2 into a Grid Engine cluster. Sun's own Studio 12 development tools have been given an Update 1, which has lots of performance tweaks for parallel programming on the latest x64 and Sparc processors, and the HPC ClusterTools 8.2 includes MPI libraries and runtimes that are based on the Open MPI spec, tested and supported by Sun for both Solaris and Linux.

The HPC ClusterTools have also been tweaked to support QDR InfiniBand and IB multi-rail, which is a multipathing technology for InfiniBand that allows a server with two ports to send traffic over both at the same time. The HPC ClusterTools now also offer support for PathScale and Intel compilers as well as Sun's Studio compiler and the open source GNU compilers. Finally, Sun has packaged up some HPC tools and its latest OpenSolaris release into a little something it calls HPC Software Developer Edition 1.0, which gives developers a single CD from which they can get all the tools they need to start coding parallel applications. ®