Feeds

Nvidia and ARM: It's a parallel, parallel, parallel world

Big changes coming to the CUDA programming model

5 things you didn’t know about cloud backup

GTC 2013 Nvidia envisions a future in which ARM processors and the GPU-maker's CUDA parallel-computing platform and programming model will work together in perfect harmony, and the company has a raft of planned CUDA enhancements to not only make that coexistence seamless, but to enhance that programming environment for discrete GPUs, as well.

"If we look five years out, we expect that ARM will be a very important platform for CUDA," Nvidia's chief technologist for GPU computing software, Mark Harris, told his audience on Tuesday at the GPU Technology Conference in San José, California.

Today, there's no SoC that combines ARM compute cores with a CUDA-enabled GPU, but that's about to change. Nvidia's next Tegra processor, code-named "Logan", will incorporate CUDA 5 support when it hits full production early next year, and its follow-on, "Parker", will upgrade that capability in a processor that Nvidia president and CEO Jen-Hsun Huang promises will have 100 times the performance of 2011's Tegra 2.

Harris and his team's work on CUDA aims to make the notoriously difficult parallel programming challenge smoother.

One of the first items on his list, as The Reg has reported in detail, is Nvidia's work with Continuum Analytics to create the NumbaPro Python-to-GPU compiler. Why Python? Well, Harris said, it's not only an exceptionally popular language, but coding in it is productive, interactive, and "even fun."

But there's more to CUDA's future than ARM and Python. For one, the CUDA development team plans compiler improvements such as just-in-time (JIT) compilation and linking of device code. "This will enable you to specialize code and even generate code on the fly," he said.

Support for C++ 11 is also in the cards, plus what Harris characterized as "really fast" sparse solvers and the addition of multi-GPU support to some CUDA libraries "where it makes sense."

C++ 11 will not, of course, be the last iteration of that language, and Harris said that he hopes that in the next version, scheduled for around 2017, "we would like for acelerators such as GPUs to be a core part of programming in C++." Towards that goal, Nvidia is working with others to include a library of parallel algorithms for that version of C++.

"And of course," he said, "we're always improving the development tools" for CUDA, such as adding "step-by-step guidance to finding developments in your application" to the next generations of Nvidia's Visual Profiler and Nsight Eclipse Edition. Harris said that these additions would present developers with information about the bottlenecks in their applications in a more-visual way.

Currently, CUDA developers need to use a separate GPU to run their display from the GPU on which they're debugging their application. "In the future," Harris said, "we'll be lowering that restriction and enabling single-GPU debug."

Those improvements, he said, will come relatively soon, but a number of larger challenges remain. As heterogeneous computing becomes more prevalent, for example, it becomes increasingly important to control the locality of data – keeping it near the core or cores that are working with it. The challenge there, he said, is to add that capability "without getting in the way, without making development more difficult."

Nvidia may be able to tackle that challenge on its own, Harris said, but a bit further down the road it's going to become increasingly important for operating systems to support what he described as "hybrid computer architectures." To that end, he said, Nvidia is working with the developers of Windows, Linux, and OS X.

There's also work to be done with compiler developers. "Obviously," he said, "compiling code for these interesting hybrid architectures is essential."

In the future, Harris believes that hybrid parallel computing will become ubiquitous, seeing as how all processors being designed today are parallel in some form or another because what he referred to as "the power wall" has pushed processor designers to improve performance by adding parallelism rather than by simply cranking up clocks.

"In the future," he said, "all programmers should be parallel programmers, or shoud be able to at least create parallel programs" – a subtle but important distinction, and one that compiler developers can help to define.

That aforementioned power wall, along with the increasing number of transistors in modern processors that need to be power-managed, will cause programmers to keep their eyes on the power prize. In the future, Harris said, programmers will need to tune their apps not just for performance, but for performance-per-watt as well. Tools and program models need to be created to give developers that ability.

"There's a lot of really big challenges in this," Harris told his audience, "so those of you that are researchers, there's a lot of hard problems to solve here, so lots of great research areas and directions."

Harris wrapped up his talk with his vision of an ARM-heavy future, when he says "it will be very common to be programming on the CUDA platform on ARM processors, on ARM systems across various industries and architectures."

And on discrete GPUs, as well. "Of course, we're Nvidia, so we envision GPUs everywhere," Harris said. ®

Boost IT visibility and business value

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
China hopes home-grown OS will oust Microsoft
Doesn't much like Apple or Google, either
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Scale data protection with your virtual environment
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify current features.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?