Nvidia shows off superjuiced Kepler GPU

From workhouse to racehorse

Top 5 reasons to deploy VMware with Tegile

HPC blog There were quite a few surprises in today’s GTC12 keynote by NVIDIA CEO and co-founder Jen-Hsun Huang.

If NVIDIA were just introducing a new and faster rev of its latest GPU processor, one that brings three times the performance without breaking the bank on energy usage, that would be a solid win, and in line with expectations. But there was more to this announcement – much more. Our buddy TPM gives the down-and-dirty details on Kepler here.

I’m not sure this is exactly the right analogy, but to me, what NVIDIA has done with Kepler is transform the GPU from a simple task-worker into a much more productive member of the knowledge working class. But Kepler isn’t a paper-shuffling, PowerPoint-wielding MBA type: it still has a solid work ethic, outperforming predecessor Fermi by more than three times. More important than performance, however, is Kepler’s sophistication in processing work.

The features I’m talking about below apply to the Kepler K20, the dual GPU behemoth that’s due in Q4 2012. You’ll also need to be using the new version of CUDA, since it contains the instructions to take advantage of these new capabilities.

The first new feature is something called Hyper-Q. With Hyper-Q, a Kepler GPU can now accept work from up to 32 CPU cores simultaneously. Before Hyper-Q, only one CPU core at a time could dispatch work to the GPU, which meant that there were long stretches of time when the GPU would sit idle while waiting for more tasks from whichever CPU core it was working with.

With Hyper-Q, the GPU is now a full-fledged team player in the system, able to accept work from many cores at the same time. This will drive GPU utilisation up, of course, but it will also push CPU utilisation higher as more CPU cores at a time can dispatch and receive work from the GPUs.

The next new wrinkle is something called Dynamic Parallelism, a feature that will also serve to radically increase overall processing speed and system utilisation while reducing programming time and complexity.

Today, without Dynamic Parallelism, GPUs are very fast, but they’re limited in what they can do on their own. Lots of routines are recursive or data dependent, meaning that the results from one set of steps or calculations dictate what happens in the next set of steps or calculations. GPUs can run through these calculations very fast, but then they have to ship the results out to the CPU and wait for further instructions. The CPU then evaluates the results and gives the GPUs another set of tasks to do – perhaps run the same calculations with new data or different assumptions.

But with Dynamic Parallelism, GPUs can now run recursive loops right on the GPU – no need to run back to the CPU for instructions. Kepler can run almost limitless loops, cranking through calculation after calculation using thousands of cores. It can spawn new processes and new processing streams without having to depend on the CPU to give it directions.

Taking advantage of Dynamic Parallelism will obviously result in higher efficiency and utilisation as highly parallelised work is performed on speedy GPUs, leaving CPUs either free to perform other work or to simply stand quietly off to one side.

I’m not a programmer by any stretch of the imagination – that’s probably obvious. But from what little experience I have, buttressed by conversations with real programmers, it’s clear that using Dynamic Parallelism will also make the CUDA programmer’s job much easier. According to NVIDIA, programming jobs that used to take 300 steps can now be accomplished with as few as 20, because they don’t have to code all of the back-and-forth traffic between CPUs and GPUs.

Just Hyper-Q and Dynamic Parallelism on their own are pretty big steps in the evolution of the GPU and hybrid computing. With the addition of these two features, the GPU is now able to be shared by an entire system, rather than just a single core, and it’s able to generate its own workload and complete much more of that workload without needing to be led through it by a slower, general-purpose CPU.

Before the Kepler K20, the CPU’s role in a hybrid system was mainly as a traffic cop – responsible for sending traffic (data and tasks) to the GPU and accepting the results. With Kepler and its advanced feature set, the GPU can now work for 32 different cops at the same time and manage a larger part of the overall job on its own. This gives the cops more time to handle other tasks, write some parking tickets, or just pull their hats down over their eyes and catch a nap. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Trio of XSS turns attackers into admins
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
prev story


Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?