Intel stretches HPC dev tools across chubby clusters

Cluster Studio XE ready for MICs, not for GPUs

Maximizing your infrastructure through virtualization

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kicks off next week. More than a few have jumped the gun with product announcements this week, including chipmaker Intel.

No, Intel is not going to launch its "Sandy Bridge-EP" Xeon E5 processors, which are expected early next year. But the new Cluster Studio XE toolset for HPC customers will help those lucky few HPC and cloud shops that have been able to get systems this year to squeeze more performance out of their Xeon E5 clusters.

The Cluster Studio XE stack includes a slew of Intel tools for creating, tuning, and monitoring parallel applications running on x86-based parallel clusters. Intel had already been selling a set of application tools called Cluster Studio, which bundled up the chip giant's C, C++, and Fortran compilers, its rendition of the message passing interface (MPI) messaging protocol that allows server nodes to share work, and various math and multithreading libraries to goose the performance of applications.

With the XE (Extended Edition) of the HPC cluster tools, Intel is goosing the performance of the MPI library, and claims its MPI 4.0.3 stack is anywhere from 3.3 to 6.5 times as fast as the OpenMPI 1.5.4 and MVAPICH2 1.6 MPI stacks from the open source community. Benchmark tests were done on a 64-node system running 768 processes and linked by InfiniBand switches.

Intel tested the Platform Computing MPI 8.1.1 stack against the three MPI stacks listed above, only this time on an eight-mode system; in this case the performance differences between Intel and Platform (which is now owned by IBM) were not huge. With the Microsoft MPI 3.2 stack on the same iron, the Intel MPI stack running on Windows servers was anywhere from 2.17 to 2.74 times faster than the Microsoft MPI.

The updated Intel MPI stack can scale to over 90,000 MPI cores, and also has hooks into the open source SLURM job scheduler that was created by Lawrence Livermore National Laboratory because of its frustration with closed-source job schedulers and the state of the open source ones.

With the Cluster Studio XE roll-up, the Inspector and Debugger modules now have cluster-level data gathering and reporting, instead of just seeing things at a node level. What this means, in plain American, is that these add-ons to the compilers can look for memory leaks and threading errors across a cluster of machines without sending the HPC application programmer on a wild goose chase to locate performance issues or crashes on an individual node. (With 90,000 cores, which is 5,625 nodes using the future eight-core Xeon E5 processors, you can't look for these issues manually.)

The Trace Analyzer and Collector module can now look at MPI performance across the nodes in a cluster and evaluate how well MPI is load balancing across the nodes. The VTune Amplifier, which is a tool that Intel uses to virtualize the threading behavior in a single node, can now show threading issues across the cluster.

The Cluster Studio XE bundle includes the Intel v12.1 compilers that were launched in September, which offered between 22 and 27 per cent better performance on Fortran benchmarks and from 6 to 11 per cent on C/C++ integer performance compared to the v12.0 releases running on Linux and Windows machines. C/C++ floating performance improvements were a few points. Intel claims it has a considerable performance advantage over other compilers – anywhere from 21 to 47 per cent faster code execution on C, C++, and Fortran tests. And that performance is not just tied to Intel's own Xeon processors.

Perhaps more significantly, on Fortran, Intel now believes it has the performance edge over Portland Group 11.4 and Absoft 11.1 on either Windows or Linux machines. The performance jump is particularly acute on Windows machines running C++.

"We believe that we have the best performance, regardless of the type of x86 chip," James Reinders, evangelist for Intel's software division, tells El Reg.

The v12.1 compilers are tuned up for the forthcoming Xeon E5 processors, and even though Intel has not been able to get its hands on machines using AMD's impending "Interlagos" Opteron 6200 processors to tune and test them, Reinders says that he is confident that the compilers and the Cluster Studio XE tools will wring more flops out of these AMD chips than the alternatives.

The interesting twist in all this is that the Cluster Studio compilers and tuning and visualization tools cannot peer into GPU coprocessors, and Reinders says he is not even sure how Intel would go about doing that. But because the future "Knights" x86-based coprocessors are based on the same architecture as Intel and AMD chips, Cluster Studio XE tools will be able to see into these MIC coprocessors and help coders tweak and tune their apps for them.

The normal Cluster Studio stack, which includes the Intel compilers as well as the math and clustering libraries, costs $1,849 per developer on a Linux workstation and $1,499 per developer on a Windows workstation. There is no runtime or royalty charge for having the tools run on a parallel x86 cluster. If you want to go all the way to the Cluster Studio XE stack, then you pay $2,849 per developer on Linux and $2,499 on Windows. Yes, the Windows versions are cheaper. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.