Feeds

Jaguar to Titan? Not so bad…

Keptacular Metamorphis

HP ProLiant Gen8: Integrated lifecycle automation

At SC11 I had the opportunity to talk to some of the people responsible for the biggest computer upgrade known to man. Oak Ridge National Labs is upgrading its current Cray XT5 ‘Jaguar’ system to a Cray XT6 system that will be known as ‘Titan’.

It’s quite a facelift. Today, Jaguar is a 1.75 PFlop supercomputer with more than 18,000 nodes containing 224,162 cores of AMD 6-core Istanbul processors. In 2009, it was the first system to provide greater than a petaflop sustained performance, taking the number one slot on the Top500 list.

Two years later it’s not exactly a performance dog, but it’s been knocked down to number three on the list, supplanted by the Fujitsu 10.5 PFlop K Computer and China’s NUDT 2.56 PFlop Tianhe-1A.

The transition from Jaguar to Titan will be profound, with a performance boost to somewhere around 20 PFlops – which should put it somewhere near the top, it not the pinnacle, of the Top500. The biggest factor in the upgrade will be the move from a traditional CPU-based architecture to a hybrid CPU+GPU design.

In final form, which will be achieved next year, each of the 18,000+ Titan nodes will have one 16-core AMD Interlagos processor and a NVIDIA Kepler GPU accelerator. Titan will have many more CPU cores than Jaguar and the additional power provided by adding 18,000 Kepler GPUs in the mix. This will make Titan the largest hybrid supercomputer in the world – not just “GPU-riffic” but “Keptacular” as well. “Keptastic,” perhaps?

The biggest hurdle here isn’t the hardware; it’s the software, right? How the hell do you CUDA-ize the hundreds of applications and millions of lines of code that are running on Jaguar and will need to run on Titan? Not surprisingly, Cray and pals NVIDIA, PGI, and CAPS have been pondering this one. They’ve come up with OpenACC, and are presenting it as a parallel programming standard.

What OpenACC does is allow programmers to insert ‘directives’ into their code that will alert the compiler to routines that should be parallelized – sent to multiple cores or to accelerators. The compiler does the work, and the programmer doesn’t have to change any of the underlying code (other than adding directives, that is, and there are tools that help them do this too.)

I don’t pretend to know any of the ins and outs of writing parallel applications (well, I do pretend to know it if I’m certain that I’m talking to people who are dumber than I am), but a presentation from Cray’s John Levesque gave me some idea of how well OpenACC works.

One of his examples was the relative performance of their CAM-SE when using different methods to gain CUDA-ization. On the current system, the CAM-SE REMAP function took 65.30 minutes to complete. Or was it seconds? (He went damned fast in the presentation, and I was in the back, but we’re talking relative performance.)

After a rewrite in anticipation for porting to an accelerator, they knocked it down to about 33.5. Hand-coding the resulting code for CUDA got them to 10.2 – a very significant speed-up. Taking the same rewritten code and running it through OpenACC gave them a 10.6 runtime – very close to hand-coded performance.

The cool thing about OpenACC is that it’s portable and chip agnostic. Using it will enable better parallelism on general purpose multi-core CPUs as well as GPU accelerators. Here’s the NVIDIA press release with some more details.

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.