Feeds

Nvidia shows off superjuiced Kepler GPU

From workhouse to racehorse

Top three mobile application threats

HPC blog There were quite a few surprises in today’s GTC12 keynote by NVIDIA CEO and co-founder Jen-Hsun Huang.

If NVIDIA were just introducing a new and faster rev of its latest GPU processor, one that brings three times the performance without breaking the bank on energy usage, that would be a solid win, and in line with expectations. But there was more to this announcement – much more. Our buddy TPM gives the down-and-dirty details on Kepler here.

I’m not sure this is exactly the right analogy, but to me, what NVIDIA has done with Kepler is transform the GPU from a simple task-worker into a much more productive member of the knowledge working class. But Kepler isn’t a paper-shuffling, PowerPoint-wielding MBA type: it still has a solid work ethic, outperforming predecessor Fermi by more than three times. More important than performance, however, is Kepler’s sophistication in processing work.

The features I’m talking about below apply to the Kepler K20, the dual GPU behemoth that’s due in Q4 2012. You’ll also need to be using the new version of CUDA, since it contains the instructions to take advantage of these new capabilities.

The first new feature is something called Hyper-Q. With Hyper-Q, a Kepler GPU can now accept work from up to 32 CPU cores simultaneously. Before Hyper-Q, only one CPU core at a time could dispatch work to the GPU, which meant that there were long stretches of time when the GPU would sit idle while waiting for more tasks from whichever CPU core it was working with.

With Hyper-Q, the GPU is now a full-fledged team player in the system, able to accept work from many cores at the same time. This will drive GPU utilisation up, of course, but it will also push CPU utilisation higher as more CPU cores at a time can dispatch and receive work from the GPUs.

The next new wrinkle is something called Dynamic Parallelism, a feature that will also serve to radically increase overall processing speed and system utilisation while reducing programming time and complexity.

Today, without Dynamic Parallelism, GPUs are very fast, but they’re limited in what they can do on their own. Lots of routines are recursive or data dependent, meaning that the results from one set of steps or calculations dictate what happens in the next set of steps or calculations. GPUs can run through these calculations very fast, but then they have to ship the results out to the CPU and wait for further instructions. The CPU then evaluates the results and gives the GPUs another set of tasks to do – perhaps run the same calculations with new data or different assumptions.

But with Dynamic Parallelism, GPUs can now run recursive loops right on the GPU – no need to run back to the CPU for instructions. Kepler can run almost limitless loops, cranking through calculation after calculation using thousands of cores. It can spawn new processes and new processing streams without having to depend on the CPU to give it directions.

Taking advantage of Dynamic Parallelism will obviously result in higher efficiency and utilisation as highly parallelised work is performed on speedy GPUs, leaving CPUs either free to perform other work or to simply stand quietly off to one side.

I’m not a programmer by any stretch of the imagination – that’s probably obvious. But from what little experience I have, buttressed by conversations with real programmers, it’s clear that using Dynamic Parallelism will also make the CUDA programmer’s job much easier. According to NVIDIA, programming jobs that used to take 300 steps can now be accomplished with as few as 20, because they don’t have to code all of the back-and-forth traffic between CPUs and GPUs.

Just Hyper-Q and Dynamic Parallelism on their own are pretty big steps in the evolution of the GPU and hybrid computing. With the addition of these two features, the GPU is now able to be shared by an entire system, rather than just a single core, and it’s able to generate its own workload and complete much more of that workload without needing to be led through it by a slower, general-purpose CPU.

Before the Kepler K20, the CPU’s role in a hybrid system was mainly as a traffic cop – responsible for sending traffic (data and tasks) to the GPU and accepting the results. With Kepler and its advanced feature set, the GPU can now work for 32 different cops at the same time and manage a larger part of the overall job on its own. This gives the cops more time to handle other tasks, write some parking tickets, or just pull their hats down over their eyes and catch a nap. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.