Intel stretches HPC dev tools across chubby clusters

Cluster Studio XE ready for MICs, not for GPUs

Secure remote control for conventional and virtual desktops

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kicks off next week. More than a few have jumped the gun with product announcements this week, including chipmaker Intel.

No, Intel is not going to launch its "Sandy Bridge-EP" Xeon E5 processors, which are expected early next year. But the new Cluster Studio XE toolset for HPC customers will help those lucky few HPC and cloud shops that have been able to get systems this year to squeeze more performance out of their Xeon E5 clusters.

The Cluster Studio XE stack includes a slew of Intel tools for creating, tuning, and monitoring parallel applications running on x86-based parallel clusters. Intel had already been selling a set of application tools called Cluster Studio, which bundled up the chip giant's C, C++, and Fortran compilers, its rendition of the message passing interface (MPI) messaging protocol that allows server nodes to share work, and various math and multithreading libraries to goose the performance of applications.

With the XE (Extended Edition) of the HPC cluster tools, Intel is goosing the performance of the MPI library, and claims its MPI 4.0.3 stack is anywhere from 3.3 to 6.5 times as fast as the OpenMPI 1.5.4 and MVAPICH2 1.6 MPI stacks from the open source community. Benchmark tests were done on a 64-node system running 768 processes and linked by InfiniBand switches.

Intel tested the Platform Computing MPI 8.1.1 stack against the three MPI stacks listed above, only this time on an eight-mode system; in this case the performance differences between Intel and Platform (which is now owned by IBM) were not huge. With the Microsoft MPI 3.2 stack on the same iron, the Intel MPI stack running on Windows servers was anywhere from 2.17 to 2.74 times faster than the Microsoft MPI.

The updated Intel MPI stack can scale to over 90,000 MPI cores, and also has hooks into the open source SLURM job scheduler that was created by Lawrence Livermore National Laboratory because of its frustration with closed-source job schedulers and the state of the open source ones.

With the Cluster Studio XE roll-up, the Inspector and Debugger modules now have cluster-level data gathering and reporting, instead of just seeing things at a node level. What this means, in plain American, is that these add-ons to the compilers can look for memory leaks and threading errors across a cluster of machines without sending the HPC application programmer on a wild goose chase to locate performance issues or crashes on an individual node. (With 90,000 cores, which is 5,625 nodes using the future eight-core Xeon E5 processors, you can't look for these issues manually.)

The Trace Analyzer and Collector module can now look at MPI performance across the nodes in a cluster and evaluate how well MPI is load balancing across the nodes. The VTune Amplifier, which is a tool that Intel uses to virtualize the threading behavior in a single node, can now show threading issues across the cluster.

The Cluster Studio XE bundle includes the Intel v12.1 compilers that were launched in September, which offered between 22 and 27 per cent better performance on Fortran benchmarks and from 6 to 11 per cent on C/C++ integer performance compared to the v12.0 releases running on Linux and Windows machines. C/C++ floating performance improvements were a few points. Intel claims it has a considerable performance advantage over other compilers – anywhere from 21 to 47 per cent faster code execution on C, C++, and Fortran tests. And that performance is not just tied to Intel's own Xeon processors.

Perhaps more significantly, on Fortran, Intel now believes it has the performance edge over Portland Group 11.4 and Absoft 11.1 on either Windows or Linux machines. The performance jump is particularly acute on Windows machines running C++.

"We believe that we have the best performance, regardless of the type of x86 chip," James Reinders, evangelist for Intel's software division, tells El Reg.

The v12.1 compilers are tuned up for the forthcoming Xeon E5 processors, and even though Intel has not been able to get its hands on machines using AMD's impending "Interlagos" Opteron 6200 processors to tune and test them, Reinders says that he is confident that the compilers and the Cluster Studio XE tools will wring more flops out of these AMD chips than the alternatives.

The interesting twist in all this is that the Cluster Studio compilers and tuning and visualization tools cannot peer into GPU coprocessors, and Reinders says he is not even sure how Intel would go about doing that. But because the future "Knights" x86-based coprocessors are based on the same architecture as Intel and AMD chips, Cluster Studio XE tools will be able to see into these MIC coprocessors and help coders tweak and tune their apps for them.

The normal Cluster Studio stack, which includes the Intel compilers as well as the math and clustering libraries, costs $1,849 per developer on a Linux workstation and $1,499 per developer on a Windows workstation. There is no runtime or royalty charge for having the tools run on a parallel x86 cluster. If you want to go all the way to the Cluster Studio XE stack, then you pay $2,849 per developer on Linux and $2,499 on Windows. Yes, the Windows versions are cheaper. ®

Internet Security Threat Report 2014

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
Cisco and friends chase WiFi's searing speeds with new cable standard
Cat 5e and Cat 6 are bottlenecks for WLAN access points
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
prev story


Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.