Feeds

Nvidia and ARM: It's a parallel, parallel, parallel world

Big changes coming to the CUDA programming model

Providing a secure and efficient Helpdesk

GTC 2013 Nvidia envisions a future in which ARM processors and the GPU-maker's CUDA parallel-computing platform and programming model will work together in perfect harmony, and the company has a raft of planned CUDA enhancements to not only make that coexistence seamless, but to enhance that programming environment for discrete GPUs, as well.

"If we look five years out, we expect that ARM will be a very important platform for CUDA," Nvidia's chief technologist for GPU computing software, Mark Harris, told his audience on Tuesday at the GPU Technology Conference in San José, California.

Today, there's no SoC that combines ARM compute cores with a CUDA-enabled GPU, but that's about to change. Nvidia's next Tegra processor, code-named "Logan", will incorporate CUDA 5 support when it hits full production early next year, and its follow-on, "Parker", will upgrade that capability in a processor that Nvidia president and CEO Jen-Hsun Huang promises will have 100 times the performance of 2011's Tegra 2.

Harris and his team's work on CUDA aims to make the notoriously difficult parallel programming challenge smoother.

One of the first items on his list, as The Reg has reported in detail, is Nvidia's work with Continuum Analytics to create the NumbaPro Python-to-GPU compiler. Why Python? Well, Harris said, it's not only an exceptionally popular language, but coding in it is productive, interactive, and "even fun."

But there's more to CUDA's future than ARM and Python. For one, the CUDA development team plans compiler improvements such as just-in-time (JIT) compilation and linking of device code. "This will enable you to specialize code and even generate code on the fly," he said.

Support for C++ 11 is also in the cards, plus what Harris characterized as "really fast" sparse solvers and the addition of multi-GPU support to some CUDA libraries "where it makes sense."

C++ 11 will not, of course, be the last iteration of that language, and Harris said that he hopes that in the next version, scheduled for around 2017, "we would like for acelerators such as GPUs to be a core part of programming in C++." Towards that goal, Nvidia is working with others to include a library of parallel algorithms for that version of C++.

"And of course," he said, "we're always improving the development tools" for CUDA, such as adding "step-by-step guidance to finding developments in your application" to the next generations of Nvidia's Visual Profiler and Nsight Eclipse Edition. Harris said that these additions would present developers with information about the bottlenecks in their applications in a more-visual way.

Currently, CUDA developers need to use a separate GPU to run their display from the GPU on which they're debugging their application. "In the future," Harris said, "we'll be lowering that restriction and enabling single-GPU debug."

Those improvements, he said, will come relatively soon, but a number of larger challenges remain. As heterogeneous computing becomes more prevalent, for example, it becomes increasingly important to control the locality of data – keeping it near the core or cores that are working with it. The challenge there, he said, is to add that capability "without getting in the way, without making development more difficult."

Nvidia may be able to tackle that challenge on its own, Harris said, but a bit further down the road it's going to become increasingly important for operating systems to support what he described as "hybrid computer architectures." To that end, he said, Nvidia is working with the developers of Windows, Linux, and OS X.

There's also work to be done with compiler developers. "Obviously," he said, "compiling code for these interesting hybrid architectures is essential."

In the future, Harris believes that hybrid parallel computing will become ubiquitous, seeing as how all processors being designed today are parallel in some form or another because what he referred to as "the power wall" has pushed processor designers to improve performance by adding parallelism rather than by simply cranking up clocks.

"In the future," he said, "all programmers should be parallel programmers, or shoud be able to at least create parallel programs" – a subtle but important distinction, and one that compiler developers can help to define.

That aforementioned power wall, along with the increasing number of transistors in modern processors that need to be power-managed, will cause programmers to keep their eyes on the power prize. In the future, Harris said, programmers will need to tune their apps not just for performance, but for performance-per-watt as well. Tools and program models need to be created to give developers that ability.

"There's a lot of really big challenges in this," Harris told his audience, "so those of you that are researchers, there's a lot of hard problems to solve here, so lots of great research areas and directions."

Harris wrapped up his talk with his vision of an ARM-heavy future, when he says "it will be very common to be programming on the CUDA platform on ARM processors, on ARM systems across various industries and architectures."

And on discrete GPUs, as well. "Of course, we're Nvidia, so we envision GPUs everywhere," Harris said. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Microsoft WINDOWS 10: Seven ATE Nine. Or Eight did really
Windows NEIN skipped, tech preview due out on Wednesday
Business is back, baby! Hasta la VISTA, Win 8... Oh, yeah, Windows 9
Forget touchscreen millennials, Microsoft goes for mouse crowd
Apple: SO sorry for the iOS 8.0.1 UPDATE BUNGLE HORROR
Apple kills 'upgrade'. Hey, Microsoft. You sure you want to be like these guys?
ARM gives Internet of Things a piece of its mind – the Cortex-M7
32-bit core packs some DSP for VIP IoT CPU LOL
Microsoft on the Threshold of a new name for Windows next week
Rebranded OS reportedly set to be flung open by Redmond
Lotus Notes inventor Ozzie invents app to talk to people on your phone
Imagine that. Startup floats with voice collab app for Win iPhone
'Google is NOT the gatekeeper to the web, as some claim'
Plus: 'Pretty sure iOS 8.0.2 will just turn the iPhone into a fax machine'
prev story

Whitepapers

A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.