Intel stretches HPC dev tools across chubby clusters

Cluster Studio XE ready for MICs, not for GPUs

Securing Web Applications Made Simple and Scalable

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kicks off next week. More than a few have jumped the gun with product announcements this week, including chipmaker Intel.

No, Intel is not going to launch its "Sandy Bridge-EP" Xeon E5 processors, which are expected early next year. But the new Cluster Studio XE toolset for HPC customers will help those lucky few HPC and cloud shops that have been able to get systems this year to squeeze more performance out of their Xeon E5 clusters.

The Cluster Studio XE stack includes a slew of Intel tools for creating, tuning, and monitoring parallel applications running on x86-based parallel clusters. Intel had already been selling a set of application tools called Cluster Studio, which bundled up the chip giant's C, C++, and Fortran compilers, its rendition of the message passing interface (MPI) messaging protocol that allows server nodes to share work, and various math and multithreading libraries to goose the performance of applications.

With the XE (Extended Edition) of the HPC cluster tools, Intel is goosing the performance of the MPI library, and claims its MPI 4.0.3 stack is anywhere from 3.3 to 6.5 times as fast as the OpenMPI 1.5.4 and MVAPICH2 1.6 MPI stacks from the open source community. Benchmark tests were done on a 64-node system running 768 processes and linked by InfiniBand switches.

Intel tested the Platform Computing MPI 8.1.1 stack against the three MPI stacks listed above, only this time on an eight-mode system; in this case the performance differences between Intel and Platform (which is now owned by IBM) were not huge. With the Microsoft MPI 3.2 stack on the same iron, the Intel MPI stack running on Windows servers was anywhere from 2.17 to 2.74 times faster than the Microsoft MPI.

The updated Intel MPI stack can scale to over 90,000 MPI cores, and also has hooks into the open source SLURM job scheduler that was created by Lawrence Livermore National Laboratory because of its frustration with closed-source job schedulers and the state of the open source ones.

With the Cluster Studio XE roll-up, the Inspector and Debugger modules now have cluster-level data gathering and reporting, instead of just seeing things at a node level. What this means, in plain American, is that these add-ons to the compilers can look for memory leaks and threading errors across a cluster of machines without sending the HPC application programmer on a wild goose chase to locate performance issues or crashes on an individual node. (With 90,000 cores, which is 5,625 nodes using the future eight-core Xeon E5 processors, you can't look for these issues manually.)

The Trace Analyzer and Collector module can now look at MPI performance across the nodes in a cluster and evaluate how well MPI is load balancing across the nodes. The VTune Amplifier, which is a tool that Intel uses to virtualize the threading behavior in a single node, can now show threading issues across the cluster.

The Cluster Studio XE bundle includes the Intel v12.1 compilers that were launched in September, which offered between 22 and 27 per cent better performance on Fortran benchmarks and from 6 to 11 per cent on C/C++ integer performance compared to the v12.0 releases running on Linux and Windows machines. C/C++ floating performance improvements were a few points. Intel claims it has a considerable performance advantage over other compilers – anywhere from 21 to 47 per cent faster code execution on C, C++, and Fortran tests. And that performance is not just tied to Intel's own Xeon processors.

Perhaps more significantly, on Fortran, Intel now believes it has the performance edge over Portland Group 11.4 and Absoft 11.1 on either Windows or Linux machines. The performance jump is particularly acute on Windows machines running C++.

"We believe that we have the best performance, regardless of the type of x86 chip," James Reinders, evangelist for Intel's software division, tells El Reg.

The v12.1 compilers are tuned up for the forthcoming Xeon E5 processors, and even though Intel has not been able to get its hands on machines using AMD's impending "Interlagos" Opteron 6200 processors to tune and test them, Reinders says that he is confident that the compilers and the Cluster Studio XE tools will wring more flops out of these AMD chips than the alternatives.

The interesting twist in all this is that the Cluster Studio compilers and tuning and visualization tools cannot peer into GPU coprocessors, and Reinders says he is not even sure how Intel would go about doing that. But because the future "Knights" x86-based coprocessors are based on the same architecture as Intel and AMD chips, Cluster Studio XE tools will be able to see into these MIC coprocessors and help coders tweak and tune their apps for them.

The normal Cluster Studio stack, which includes the Intel compilers as well as the math and clustering libraries, costs $1,849 per developer on a Linux workstation and $1,499 per developer on a Windows workstation. There is no runtime or royalty charge for having the tools run on a parallel x86 cluster. If you want to go all the way to the Cluster Studio XE stack, then you pay $2,849 per developer on Linux and $2,499 on Windows. Yes, the Windows versions are cheaper. ®

The Essential Guide to IT Transformation

More from The Register

next story
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
Seagate chances ARM with NAS boxes for the SOHO crowd
There's an Atom-powered offering, too
Intel teaches Oracle how to become the latest and greatest Xeon Whisperer
E7-8895 v2 chips are best of the bunch, and with firmware-unlocked speed control
Gartner: To the right, to the right – biz sync firms who've won in a box to the right...
Magic quadrant: Top marks for, er, completeness of vision, EMC
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Mobile application security vulnerability report
The alarming realities regarding the sheer number of applications vulnerable to attack, and the most common and easily addressable vulnerability errors.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.