Nvidia reveals CUDA 6, joins CPU-GPU shared memory party

Tesla headman: 'Biggest pain point' for developers – memory management – is now history

Combat fraud and increase customer satisfaction

Nvidia has announced the latest version of its GPU programming language, CUDA 6, which adds a "Unified Memory" capability that, as its name implies, relieves programmers from the trials and tribulations of having to manually copy data back and forth between separate CPU and GPU memory spaces.

CUDA 6 Unified Memory schematic

CUDA 6 fools a system's CPU and GPU into thinking they're dipping into the same shared memory bank

"Programmers have always found it hard to program GPUs," Sumit Gupta, the general manager of Nvidia's HPC-focused Tesla biz told The Reg, "and one of the biggest reasons for that – in fact, this is the reason – has been that there were always two memory spaces: the CPU and its memory, and the GPU and its own memory."

Being software, CUDA of course does nothing to physically unite those two memory spaces – the CPU still has its own memory and the GPU has its own chunk. To a programmer using CUDA 6, however, that distinction disappears: all the memory access, delivery, and management goes "underneath the covers," to borrow the phrase Oracle's Nandini Ramani used to describe Java 8's approach to parallel programming at this week's AMD developer conference, APU13.

From the point of view of the developer using CUDA 6, the memory spaces of the CPU and GPU might as well be physically one and the same. "The developer now can just operate on the data," Gupta says.

In other words, if a dev wants to add A to B, and A is in the CPU memory while B is in the GPU memory, the newly lucky dev can now just say "add A to B," and not give a fig about where either bit of data resides – the underlying CUDA 6 plumbing will take care of accessing A and B and munging them together.

The 'super simplified' memory management code introduced in CUDA 6

Before CUDA 6, left; and after, right (click to enlarge)

According to Gupta, this new capability reduces programming effort by almost 50 per cent. Not being a CUDA programmer himself, your Reg reporter will have to wait for reports from the field – or in the article comments – to judge the veracity of the Tesla honcho's assertion.

To support his point, Gupta said, "We have several programmers who have told us that their biggest pain point on day one was always managing the data movement and the memory and the memory management. And by taking care of that, automatically doing that, we've significantly improved programmer productivity."

There is, of course, still some latency involved in moving the data from where, for example, the CPU can work with it to where the GPU can get its hands – or cores – on it, but the developer doesn't have to worry about writing the code to transfer it, nor does the compiler have to deal with the extra lines of code that were previously necessary to accomplish that move.

CUDA 6 adds a few other niceties such as new drop-in libraries that replace some CPU libraries with GPU libraries, and some redesigned GPU libraries that automatically scale across up to eight GPUs in a single node.

But Gupta told us that what devs have been clamoring for most avidly is to be freed from memory-management chores, which Unified Memory provides.

With CUDA 6, he said, "The programmer just blissfully programs." ®


In a related development, Mentor Graphics has announced that it is adding support for OpenACC 2.0 into its GCC compiler, thus adding the ability to generate assembly-level instructions for Nvidia GPUs into that industry-standard tool.

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
prev story


Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.