Nvidia shows off superjuiced Kepler GPU

From workhouse to racehorse

7 Elements of Radically Simple OS Migration

HPC blog There were quite a few surprises in today’s GTC12 keynote by NVIDIA CEO and co-founder Jen-Hsun Huang.

If NVIDIA were just introducing a new and faster rev of its latest GPU processor, one that brings three times the performance without breaking the bank on energy usage, that would be a solid win, and in line with expectations. But there was more to this announcement – much more. Our buddy TPM gives the down-and-dirty details on Kepler here.

I’m not sure this is exactly the right analogy, but to me, what NVIDIA has done with Kepler is transform the GPU from a simple task-worker into a much more productive member of the knowledge working class. But Kepler isn’t a paper-shuffling, PowerPoint-wielding MBA type: it still has a solid work ethic, outperforming predecessor Fermi by more than three times. More important than performance, however, is Kepler’s sophistication in processing work.

The features I’m talking about below apply to the Kepler K20, the dual GPU behemoth that’s due in Q4 2012. You’ll also need to be using the new version of CUDA, since it contains the instructions to take advantage of these new capabilities.

The first new feature is something called Hyper-Q. With Hyper-Q, a Kepler GPU can now accept work from up to 32 CPU cores simultaneously. Before Hyper-Q, only one CPU core at a time could dispatch work to the GPU, which meant that there were long stretches of time when the GPU would sit idle while waiting for more tasks from whichever CPU core it was working with.

With Hyper-Q, the GPU is now a full-fledged team player in the system, able to accept work from many cores at the same time. This will drive GPU utilisation up, of course, but it will also push CPU utilisation higher as more CPU cores at a time can dispatch and receive work from the GPUs.

The next new wrinkle is something called Dynamic Parallelism, a feature that will also serve to radically increase overall processing speed and system utilisation while reducing programming time and complexity.

Today, without Dynamic Parallelism, GPUs are very fast, but they’re limited in what they can do on their own. Lots of routines are recursive or data dependent, meaning that the results from one set of steps or calculations dictate what happens in the next set of steps or calculations. GPUs can run through these calculations very fast, but then they have to ship the results out to the CPU and wait for further instructions. The CPU then evaluates the results and gives the GPUs another set of tasks to do – perhaps run the same calculations with new data or different assumptions.

But with Dynamic Parallelism, GPUs can now run recursive loops right on the GPU – no need to run back to the CPU for instructions. Kepler can run almost limitless loops, cranking through calculation after calculation using thousands of cores. It can spawn new processes and new processing streams without having to depend on the CPU to give it directions.

Taking advantage of Dynamic Parallelism will obviously result in higher efficiency and utilisation as highly parallelised work is performed on speedy GPUs, leaving CPUs either free to perform other work or to simply stand quietly off to one side.

I’m not a programmer by any stretch of the imagination – that’s probably obvious. But from what little experience I have, buttressed by conversations with real programmers, it’s clear that using Dynamic Parallelism will also make the CUDA programmer’s job much easier. According to NVIDIA, programming jobs that used to take 300 steps can now be accomplished with as few as 20, because they don’t have to code all of the back-and-forth traffic between CPUs and GPUs.

Just Hyper-Q and Dynamic Parallelism on their own are pretty big steps in the evolution of the GPU and hybrid computing. With the addition of these two features, the GPU is now able to be shared by an entire system, rather than just a single core, and it’s able to generate its own workload and complete much more of that workload without needing to be led through it by a slower, general-purpose CPU.

Before the Kepler K20, the CPU’s role in a hybrid system was mainly as a traffic cop – responsible for sending traffic (data and tasks) to the GPU and accepting the results. With Kepler and its advanced feature set, the GPU can now work for 32 different cops at the same time and manage a larger part of the overall job on its own. This gives the cops more time to handle other tasks, write some parking tickets, or just pull their hats down over their eyes and catch a nap. ®

Best practices for enterprise data

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
VMware builds product executables on 50 Mac Minis
And goes to the Genius Bar for support
Multipath TCP speeds up the internet so much that security breaks
Black Hat research says proposed protocol will bork network probes, flummox firewalls
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft's Euro cloud darkens: US FEDS can dig into foreign servers
They're not emails, they're business records, says court
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
prev story


7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?