Feeds

Nvidia and ARM: It's a parallel, parallel, parallel world

Big changes coming to the CUDA programming model

Combat fraud and increase customer satisfaction

GTC 2013 Nvidia envisions a future in which ARM processors and the GPU-maker's CUDA parallel-computing platform and programming model will work together in perfect harmony, and the company has a raft of planned CUDA enhancements to not only make that coexistence seamless, but to enhance that programming environment for discrete GPUs, as well.

"If we look five years out, we expect that ARM will be a very important platform for CUDA," Nvidia's chief technologist for GPU computing software, Mark Harris, told his audience on Tuesday at the GPU Technology Conference in San José, California.

Today, there's no SoC that combines ARM compute cores with a CUDA-enabled GPU, but that's about to change. Nvidia's next Tegra processor, code-named "Logan", will incorporate CUDA 5 support when it hits full production early next year, and its follow-on, "Parker", will upgrade that capability in a processor that Nvidia president and CEO Jen-Hsun Huang promises will have 100 times the performance of 2011's Tegra 2.

Harris and his team's work on CUDA aims to make the notoriously difficult parallel programming challenge smoother.

One of the first items on his list, as The Reg has reported in detail, is Nvidia's work with Continuum Analytics to create the NumbaPro Python-to-GPU compiler. Why Python? Well, Harris said, it's not only an exceptionally popular language, but coding in it is productive, interactive, and "even fun."

But there's more to CUDA's future than ARM and Python. For one, the CUDA development team plans compiler improvements such as just-in-time (JIT) compilation and linking of device code. "This will enable you to specialize code and even generate code on the fly," he said.

Support for C++ 11 is also in the cards, plus what Harris characterized as "really fast" sparse solvers and the addition of multi-GPU support to some CUDA libraries "where it makes sense."

C++ 11 will not, of course, be the last iteration of that language, and Harris said that he hopes that in the next version, scheduled for around 2017, "we would like for acelerators such as GPUs to be a core part of programming in C++." Towards that goal, Nvidia is working with others to include a library of parallel algorithms for that version of C++.

"And of course," he said, "we're always improving the development tools" for CUDA, such as adding "step-by-step guidance to finding developments in your application" to the next generations of Nvidia's Visual Profiler and Nsight Eclipse Edition. Harris said that these additions would present developers with information about the bottlenecks in their applications in a more-visual way.

Currently, CUDA developers need to use a separate GPU to run their display from the GPU on which they're debugging their application. "In the future," Harris said, "we'll be lowering that restriction and enabling single-GPU debug."

Those improvements, he said, will come relatively soon, but a number of larger challenges remain. As heterogeneous computing becomes more prevalent, for example, it becomes increasingly important to control the locality of data – keeping it near the core or cores that are working with it. The challenge there, he said, is to add that capability "without getting in the way, without making development more difficult."

Nvidia may be able to tackle that challenge on its own, Harris said, but a bit further down the road it's going to become increasingly important for operating systems to support what he described as "hybrid computer architectures." To that end, he said, Nvidia is working with the developers of Windows, Linux, and OS X.

There's also work to be done with compiler developers. "Obviously," he said, "compiling code for these interesting hybrid architectures is essential."

In the future, Harris believes that hybrid parallel computing will become ubiquitous, seeing as how all processors being designed today are parallel in some form or another because what he referred to as "the power wall" has pushed processor designers to improve performance by adding parallelism rather than by simply cranking up clocks.

"In the future," he said, "all programmers should be parallel programmers, or shoud be able to at least create parallel programs" – a subtle but important distinction, and one that compiler developers can help to define.

That aforementioned power wall, along with the increasing number of transistors in modern processors that need to be power-managed, will cause programmers to keep their eyes on the power prize. In the future, Harris said, programmers will need to tune their apps not just for performance, but for performance-per-watt as well. Tools and program models need to be created to give developers that ability.

"There's a lot of really big challenges in this," Harris told his audience, "so those of you that are researchers, there's a lot of hard problems to solve here, so lots of great research areas and directions."

Harris wrapped up his talk with his vision of an ARM-heavy future, when he says "it will be very common to be programming on the CUDA platform on ARM processors, on ARM systems across various industries and architectures."

And on discrete GPUs, as well. "Of course, we're Nvidia, so we envision GPUs everywhere," Harris said. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.