Cray mimics Ethernet atop SeaStar interconnect

Linux shortcut cooks with SLES

Boost IT visibility and business value

Preview of things to come

Bolding warns that at the moment this is a "feature release," which means the Cluster Compatibility Mode is really a technology preview. He adds that the clone TCP/IP stack riding atop the SeaStar interconnect "can provide reasonable performance for a relatively small number of nodes," but cautions that on very large XT implementations, customers are going to want to fall back on what Cray is now calling Extreme Scalability Mode - recompiling the Linux applications to have their nodes talk directly through the SeaStar interconnect. CCM can scale to 2,048 cores on the TCP/IP stack now, which means somewhere between 85 and 170 nodes, depending on the Opteron processors customers choose.

Next year, Cluster Compatibility Mode will get a whole lot more interesting, when Cray supports the OpenFabrics Enterprise Distribution (OFED) drivers for InfiniBand much as it is doing for TCP/IP drivers today. One of the key features of InfiniBand that Ethernet still does not have (but soon will) is called Remote Direct Memory Access, which allows server nodes to talk to each other directly, using InfiniBand controllers to link memory controllers, bypassing the network stack entirely and offering much lower latency than even 10 Gigabit Ethernet. In essence, with support of the OFED drivers, Cray's Cluster Compatibility Mode will allow the SeaStar interconnect to emulate InfiniBand and yield much better performance than the emulated TCP/IP stack being offered initially with CLE 3.0.

"Once we have the OFED drivers, we think we can come very close to our native communications speed," says Bolding. No word yet on how far the emulated InfiniBand will scale in terms of processor nodes, but it has to be pretty far to bother to go to the trouble.

Cray has been working on Cluster Compatibility Mode for the past two years, and Bolding admits that this clever network emulation would have been useful for Cray to expand its addressable market. But at the time, Cray was more concerned with breaking the petaflops barrier at the big supercomputing centers like Oak Ridge National Laboratory that are paying the current bills.

Cray has high hopes for Cluster Compatibility Mode. "We think this will take away the fear of getting a Cray system," explains Bolding. "We have removed cost as a concern over the past few years, and when we did, some customers feared that they would end up getting something that was not compatible with other Linux machines."

Well, of course, the customers were right in this regard. But if the OFED drivers running atop SeaStar and emulating InfiniBand work as well as Bolding says they can, this would indeed be another barrier down. Provided the SeaStar interconnect has enough oomph that emulated InfiniBand performs as well or better than the real thing, of course. By the way, the emulated Ethernet and InfiniBand drivers will support multiple MPI stacks, so you are not locked in.

CLE 3.0 will initially only ship on the new XT6 and XT6m parallel supers, which use blade servers based on the brand-new twelve-core "Magny-Cours" Opteron 6100 processors from Advanced Micro Devices. The XT6 nodes were previewed last fall at the SC09 supercomputing trade show; Cray has not said that the XT6 nodes are actually shipping yet in volume.

Later this year, Cray will support CLE 3.0 on the XT5 supers, which are based on an earlier six-core Opteron generation but which are based on the same SeaStar2+ generation of interconnect that was held over for the XT6 nodes. In early 2011, Cray will support CLE 3.0 on XT4 generations of supers, but has no plans to support it on XT3 machines. It is a matter of testing and qualification, which Cray is not going to spend money on with so few of these XT3 machines still in the field. If you want to run an emulated Ethernet-MPI stack on top of an XT3 machine, you have to move up to an XT4 or higher.

Presumably the combination of the upcoming "Gemini" interconnect and the XT6 nodes, which comprise the "Baker" family of Opteron supers machines slated for later this year, will have some sort of hardware assistance for helping speed up the emulated Ethernet or InfiniBand that the Cluster Compatibility Mode offers inside CLE 3.0. Bolding did not say.

CLE 3.0 has a number of other enhancements. First, it includes Oracle's open source Lustre 1.8 clustered file system, and also supports IBM's Global Parallel File System (GFPS) and Panasas clustered file systems. GPFS and Panasas are new; the Cray XTs have been running Lustre since their inception. CLE 3.0 is also designed to scale across 500,000 cores in a parallel cluster, up from a 200,000-core ceiling with CLE 2.0. CLE 3.0 also includes a diagnostic tool called NodeKARE, short for Node Knowledge and Reconfiguration, which makes sure jobs are scheduled to run only on nodes that are behaving themselves and not acting all wobbly.

What Cray has not said is whether or not it will be offering a Cluster Compatibility Mode in conjunction with Microsoft for its XT line of supers. This would be clearly very useful. Although Cray supports Windows HPC Server 2008 on its baby and midrange lineup, this Windows variant is not supported on the massively scalable XT line. But over the long haul, that will have to be a goal for the company, since the point of having an entry and midrange super line is to get he customers and grow them up to full-scale, massively parallel machines as their workloads expand. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.