The Register® — Biting the hand that feeds IT

Feeds

Nvidia and ARM: It's a parallel, parallel, parallel world

Big changes coming to the CUDA programming model

Free ESG report : Seamless data management with Avere FXT

GTC 2013 Nvidia envisions a future in which ARM processors and the GPU-maker's CUDA parallel-computing platform and programming model will work together in perfect harmony, and the company has a raft of planned CUDA enhancements to not only make that coexistence seamless, but to enhance that programming environment for discrete GPUs, as well.

"If we look five years out, we expect that ARM will be a very important platform for CUDA," Nvidia's chief technologist for GPU computing software, Mark Harris, told his audience on Tuesday at the GPU Technology Conference in San José, California.

Today, there's no SoC that combines ARM compute cores with a CUDA-enabled GPU, but that's about to change. Nvidia's next Tegra processor, code-named "Logan", will incorporate CUDA 5 support when it hits full production early next year, and its follow-on, "Parker", will upgrade that capability in a processor that Nvidia president and CEO Jen-Hsun Huang promises will have 100 times the performance of 2011's Tegra 2.

Harris and his team's work on CUDA aims to make the notoriously difficult parallel programming challenge smoother.

One of the first items on his list, as The Reg has reported in detail, is Nvidia's work with Continuum Analytics to create the NumbaPro Python-to-GPU compiler. Why Python? Well, Harris said, it's not only an exceptionally popular language, but coding in it is productive, interactive, and "even fun."

But there's more to CUDA's future than ARM and Python. For one, the CUDA development team plans compiler improvements such as just-in-time (JIT) compilation and linking of device code. "This will enable you to specialize code and even generate code on the fly," he said.

Support for C++ 11 is also in the cards, plus what Harris characterized as "really fast" sparse solvers and the addition of multi-GPU support to some CUDA libraries "where it makes sense."

C++ 11 will not, of course, be the last iteration of that language, and Harris said that he hopes that in the next version, scheduled for around 2017, "we would like for acelerators such as GPUs to be a core part of programming in C++." Towards that goal, Nvidia is working with others to include a library of parallel algorithms for that version of C++.

"And of course," he said, "we're always improving the development tools" for CUDA, such as adding "step-by-step guidance to finding developments in your application" to the next generations of Nvidia's Visual Profiler and Nsight Eclipse Edition. Harris said that these additions would present developers with information about the bottlenecks in their applications in a more-visual way.

Currently, CUDA developers need to use a separate GPU to run their display from the GPU on which they're debugging their application. "In the future," Harris said, "we'll be lowering that restriction and enabling single-GPU debug."

Those improvements, he said, will come relatively soon, but a number of larger challenges remain. As heterogeneous computing becomes more prevalent, for example, it becomes increasingly important to control the locality of data – keeping it near the core or cores that are working with it. The challenge there, he said, is to add that capability "without getting in the way, without making development more difficult."

Nvidia may be able to tackle that challenge on its own, Harris said, but a bit further down the road it's going to become increasingly important for operating systems to support what he described as "hybrid computer architectures." To that end, he said, Nvidia is working with the developers of Windows, Linux, and OS X.

There's also work to be done with compiler developers. "Obviously," he said, "compiling code for these interesting hybrid architectures is essential."

In the future, Harris believes that hybrid parallel computing will become ubiquitous, seeing as how all processors being designed today are parallel in some form or another because what he referred to as "the power wall" has pushed processor designers to improve performance by adding parallelism rather than by simply cranking up clocks.

"In the future," he said, "all programmers should be parallel programmers, or shoud be able to at least create parallel programs" – a subtle but important distinction, and one that compiler developers can help to define.

That aforementioned power wall, along with the increasing number of transistors in modern processors that need to be power-managed, will cause programmers to keep their eyes on the power prize. In the future, Harris said, programmers will need to tune their apps not just for performance, but for performance-per-watt as well. Tools and program models need to be created to give developers that ability.

"There's a lot of really big challenges in this," Harris told his audience, "so those of you that are researchers, there's a lot of hard problems to solve here, so lots of great research areas and directions."

Harris wrapped up his talk with his vision of an ARM-heavy future, when he says "it will be very common to be programming on the CUDA platform on ARM processors, on ARM systems across various industries and architectures."

And on discrete GPUs, as well. "Of course, we're Nvidia, so we envision GPUs everywhere," Harris said. ®

5 ways to reduce advertising network latency

Whitepapers

Microsoft’s Cloud OS
System Center Virtual Machine manager and how this product allows the level of virtualization abstraction to move from individual physical computers and clusters to unifying the whole Data Centre as an abstraction layer.
5 ways to prepare your advertising infrastructure for disaster
Being prepared allows your brand to greatly improve your advertising infrastructure performance and reliability that, in the end, will boost confidence in your brand.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: Hate phishing emails? You'll love DMARC
DMARC has been created as a standard to help properly authenticate your sends and monitor and report phishers that are trying to send from your name..
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?

More from The Register

next story
Windows 8 fans out-enthuse Apple fanbois
Redmond allows 81 Win 8 devices to use one user ID, solving side-loading shemozzle
'200 million' fanbois using iOS 7 just a week after release - study
Plus: Most US iDevice users are drinking Cupertino's latest Koolaid
No luck at all for BlackBerry as Messenger apps launch stalls
Leaked Android build 'causes issues,' is withdrawn
App Store ratings mess: What do we like? Sigh, we dunno – fanbois
How do I know what to download if I don't know what everyone else is doing?
OUCH: Google preps ad goo injection for Android mobile Gmail app
Don't worry, fandroids, wallet-plumping serum won't hurt a bit
Launchpads, catapults... what a load of - WAIT, there's £15m for grabs?
Quango sprinkles cash on games, animation and trendy meeja types
Apple iOS 7 makes some users literally SICK. As in puking, not upset
'Eye candy really is as bad as classical candy is for the teeth,' writes one
Google reveals its Hummingbird: Fly, my little algorithm - FLY!
Update brings Googleplex one step closer to sentience
prev story