Feeds

Weather gets granular with GPUs

Just say NOAA

Secure remote control for conventional and virtual desktops

HPC Blog Everyone complains about the weather, but no one is doing anything about it.

The folks at the National Oceanic and Atmospheric Administration (NOAA) aren't doing anything about the weather. They're too busy trying to figure out what it's going to do tomorrow and next week.

I sat in on a very interesting presentation from NOAA on Tuesday afternoon at the GPU Technology Conference about how they're going to use GPUs to sharpen up their forecasts and dial them in to a much greater resolution. This is quite a computational problem, as it turns out. In 2008, it took 800 cores to drive their model at a 15 to 30 kilometer resolution. To get to a 10 KM resolution, it took a bit more hardware –125,200 more processor cores, to be exact – for a grand total of 126,000 cores.

Their next step is to get to 3.5 KM resolution, which is an entirely different kettle of fish. The only way to get to this level of granularity is to move to GPUs in a big way - which is what they're pursuing right now. They've learned some lessons along the way, the foremost being that the key to efficiently taking advantage of GPUs is to intimately know their code.

For example, when they were running their models with CPUs exclusively, interprocess communications used up about 5 per cent of the cycles. The move to GPUs didn't change the need or the time necessary for these communications, but because of the greater speed of the GPUs, the ratio of communication time to processing became 50% of total processing - making these processes enemy number one.

Memory management is also hugely important. GPUs are incredibly fast on the right code, but not understanding how to best utilize the memory on the GPU card can keep you from getting the most out of them. There are two classes of memory on the cards: the 16k that is closest to each GPU core, and then the much larger (1GB in the NOAA situation) global shared memory on the card.

The difference in speed in accessing this memory is profound - it takes only two cycles to get to the close memory, and 100 cycles to get to the global memory. Accessing memory on the server host would, assumedly, be measured in geologic time. Wise use of the blazingly fast, but tiny, memory attached to each core can make the difference between going faster and going a whole hell of a lot faster.

Likewise, constantly fetching data from the CPU-based host server is also costly from a performance standpoint. One weather model, called WRF (pronounced "Worf," like the Star Trek guy) showed a 20x speed-up in raw performance that shrunk to 7x when taking into account the time needed to copy data from the server over and over again. The NOAA folks have restructured their programs to minimize data copying and have seen performance rise commensurately.

Right now they're seeing performance ranging from 15x to 39x speed-up with GPU + CPU systems vs. exclusively CPU-based hardware. This is with fully optimized CUDA code running on a smallish pilot system, but it has proven the validity of their approach and is a pretty big win. Their push going forward is to scale the model to larger hardware - fueled by GPUs - completing the transition to the 3.5 KM resolution. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Ellison: Sparc M7 is Oracle's most important silicon EVER
'Acceleration engines' key to performance, security, Larry says
Linux? Bah! Red Hat has its eye on the CLOUD – and it wants to own it
CEO says it will be 'undisputed leader' in enterprise cloud tech
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Hey, what's a STORAGE company doing working on Internet-of-Cars?
Boo - it's not a terabyte car, it's just predictive maintenance and that
prev story

Whitepapers

A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.