Feeds

Weather gets granular with GPUs

Just say NOAA

Internet Security Threat Report 2014

HPC Blog Everyone complains about the weather, but no one is doing anything about it.

The folks at the National Oceanic and Atmospheric Administration (NOAA) aren't doing anything about the weather. They're too busy trying to figure out what it's going to do tomorrow and next week.

I sat in on a very interesting presentation from NOAA on Tuesday afternoon at the GPU Technology Conference about how they're going to use GPUs to sharpen up their forecasts and dial them in to a much greater resolution. This is quite a computational problem, as it turns out. In 2008, it took 800 cores to drive their model at a 15 to 30 kilometer resolution. To get to a 10 KM resolution, it took a bit more hardware –125,200 more processor cores, to be exact – for a grand total of 126,000 cores.

Their next step is to get to 3.5 KM resolution, which is an entirely different kettle of fish. The only way to get to this level of granularity is to move to GPUs in a big way - which is what they're pursuing right now. They've learned some lessons along the way, the foremost being that the key to efficiently taking advantage of GPUs is to intimately know their code.

For example, when they were running their models with CPUs exclusively, interprocess communications used up about 5 per cent of the cycles. The move to GPUs didn't change the need or the time necessary for these communications, but because of the greater speed of the GPUs, the ratio of communication time to processing became 50% of total processing - making these processes enemy number one.

Memory management is also hugely important. GPUs are incredibly fast on the right code, but not understanding how to best utilize the memory on the GPU card can keep you from getting the most out of them. There are two classes of memory on the cards: the 16k that is closest to each GPU core, and then the much larger (1GB in the NOAA situation) global shared memory on the card.

The difference in speed in accessing this memory is profound - it takes only two cycles to get to the close memory, and 100 cycles to get to the global memory. Accessing memory on the server host would, assumedly, be measured in geologic time. Wise use of the blazingly fast, but tiny, memory attached to each core can make the difference between going faster and going a whole hell of a lot faster.

Likewise, constantly fetching data from the CPU-based host server is also costly from a performance standpoint. One weather model, called WRF (pronounced "Worf," like the Star Trek guy) showed a 20x speed-up in raw performance that shrunk to 7x when taking into account the time needed to copy data from the server over and over again. The NOAA folks have restructured their programs to minimize data copying and have seen performance rise commensurately.

Right now they're seeing performance ranging from 15x to 39x speed-up with GPU + CPU systems vs. exclusively CPU-based hardware. This is with fully optimized CUDA code running on a smallish pilot system, but it has proven the validity of their approach and is a pretty big win. Their push going forward is to scale the model to larger hardware - fueled by GPUs - completing the transition to the 3.5 KM resolution. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.