The GPU tails wag the CPU dogs at Nvidia show

Where are the Tesla roadmaps?

Boost IT visibility and business value

Updated The Nvidia-sponsored 2010 GPU Technical Conference kicks off today in San Jose, California, and all of the key HPC players as well as some upstarts will be on hand to try to surf on the cresting wave of CPU-GPU hybrid computing models that will no doubt start taking over the HPC centers of the world and start moving out to our desktops and into corporate data centers in the coming years.

If you were expecting some insight into what Nvidia has cooking for the generation of GPU chips beyond the current "Fermi" chips that are used in the company's GeForce and Quadro graphics cards and Tesla co-processors, then you are going to be disappointed. Because according to Sumit Gupta, senior product manager of the Tesla line, Nvidia is not talking about roadmaps at the GPU Tech Conference.

And that is a damned shame, because now that the Fermi-based graphics cards, GPUs co-processors (both regular C2050 and C2070 PCI-Express cards and fanless M2050 and M2070 models for ceepie-geepie HPC clusters) are in the field, what everyone wants to really know is what Nvidia is going to do next.

There's plenty of chatter, of course, and it doesn't take a genius to figure out what Nvidia's next moves will be. First, there will be a process shrink that allows the company to get more flops out of a GPU, very likely offeringroughly twice the GPU cores and twice the oomph of the current machines.

The path is pretty plain. The first generation Tesla co-processor, the C870, debuted in the summer of 2007 with 128 cores running at 600 MHz, 1.5 GB of GDDR3 memory running at 1.6 GHz, and only offering single-precision floating point math. The C870, which burned 171 watts, was rated at 345.6 gigaflops. The second generation Tesla 10 GPU co-processors made their debut in November 2008 in the Tesla C1060, launching at the SC08 supercomputing conference, which had 240 cores running at 600 MHz, 4 GB of GDDR3 memory at the same 1.6 GHz speed, and the addition of double-precision math. The C1060 was rated at 622.1 gigaflops on single-precision math, but only 77.8 gigaflops on double precision.

With the Fermi GPUs at the heart of the current generation of Tesla 20 co-processors, Nvidia is shipping the C2050 and C2070, which have 448 cores running at 575 MHz and either 3 GB (C2050) or 6 GB (C2070) of GDDR5 memory running at a much faster 3 GHz. The Tesla 20 GPU co-processors offer more balanced floating point performance, with 1.03 teraflops of single-precision oomph and 515.2 gigaflops of double-precision number crunching. The Tesla 20s, which were announced in November 2009 at the SC09 conference, had the added extra goodie of ECC scrubbing on the GDDR5 memory inside the GPU co-processor - something that a lot of HPC workloads require and something that is missing from AMD's line of FireStream GPU co-processors.

A betting man would say that at SC10 this year in New Orleans Nvidia will be talking about the guts behind the forthcoming Tesla 30 co-processors and related discrete GPU graphics cards. The GPUs were designed with 128, 256, and 512 cores in the first three generations, delivering 128, 240, and 448 working cores after the boogers in each chip were de-allocated.

It is a pretty safe bet that Nvidia is trying to cram 1,024 cores in its next GPU design, and based on current trends, where a successively larger percent of the cores don't make it, a fair guess is that 838 cores will be live in whatever future designs come out unless Taiwan Semiconductor Manufacturing Corp gets better yields on future processes than it is getting on current ones.

Adding so many cores on what I will call the Tesla 30 GPU co-processors probably means dropping the clock speed a bit, too. Maybe to somewhere around 500 MHz or so, depending on how hot the chip gets. If this is the case, then the single precision math on such a future Tesla 30 GPU co-processor would come in at 1.5 teraflops or so. If the clock speed can be pushed up to 600 MHz, that gets you about 1.8 teraflops.

Obviously, having more cores not be duds means you can drop the clock speed and still get the same flops. In an ideal universe, all 1,024 potential cores would run at 600 MHz and you'd get 2.2 teraflops. I am assuming that Nvidia can keep double-precision math to half the rate of single-precision math going forward, and I bet Tesla customers are, too. And if the naming conventions mean anything, then this future GPU co-processor will be called the C3040, with a C3060 variant with extra GDDR5 memory.

I'll take my Tesla 30s now, Nvidia. Thank you very much. Now let's talk about the Tesla 40s...

But seriously, there is one other interesting possibility that Nvidia could throw into some future generation of GPU co-processor, and it reminds me of an old joke: A man walks into a doctor's office with a chicken on his head and the chicken says, "Hey, doc, can you cut this idiot off my ass?"

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.