The GPU tails wag the CPU dogs at Nvidia show

Where are the Tesla roadmaps?

Providing a secure and efficient Helpdesk

Updated The Nvidia-sponsored 2010 GPU Technical Conference kicks off today in San Jose, California, and all of the key HPC players as well as some upstarts will be on hand to try to surf on the cresting wave of CPU-GPU hybrid computing models that will no doubt start taking over the HPC centers of the world and start moving out to our desktops and into corporate data centers in the coming years.

If you were expecting some insight into what Nvidia has cooking for the generation of GPU chips beyond the current "Fermi" chips that are used in the company's GeForce and Quadro graphics cards and Tesla co-processors, then you are going to be disappointed. Because according to Sumit Gupta, senior product manager of the Tesla line, Nvidia is not talking about roadmaps at the GPU Tech Conference.

And that is a damned shame, because now that the Fermi-based graphics cards, GPUs co-processors (both regular C2050 and C2070 PCI-Express cards and fanless M2050 and M2070 models for ceepie-geepie HPC clusters) are in the field, what everyone wants to really know is what Nvidia is going to do next.

There's plenty of chatter, of course, and it doesn't take a genius to figure out what Nvidia's next moves will be. First, there will be a process shrink that allows the company to get more flops out of a GPU, very likely offeringroughly twice the GPU cores and twice the oomph of the current machines.

The path is pretty plain. The first generation Tesla co-processor, the C870, debuted in the summer of 2007 with 128 cores running at 600 MHz, 1.5 GB of GDDR3 memory running at 1.6 GHz, and only offering single-precision floating point math. The C870, which burned 171 watts, was rated at 345.6 gigaflops. The second generation Tesla 10 GPU co-processors made their debut in November 2008 in the Tesla C1060, launching at the SC08 supercomputing conference, which had 240 cores running at 600 MHz, 4 GB of GDDR3 memory at the same 1.6 GHz speed, and the addition of double-precision math. The C1060 was rated at 622.1 gigaflops on single-precision math, but only 77.8 gigaflops on double precision.

With the Fermi GPUs at the heart of the current generation of Tesla 20 co-processors, Nvidia is shipping the C2050 and C2070, which have 448 cores running at 575 MHz and either 3 GB (C2050) or 6 GB (C2070) of GDDR5 memory running at a much faster 3 GHz. The Tesla 20 GPU co-processors offer more balanced floating point performance, with 1.03 teraflops of single-precision oomph and 515.2 gigaflops of double-precision number crunching. The Tesla 20s, which were announced in November 2009 at the SC09 conference, had the added extra goodie of ECC scrubbing on the GDDR5 memory inside the GPU co-processor - something that a lot of HPC workloads require and something that is missing from AMD's line of FireStream GPU co-processors.

A betting man would say that at SC10 this year in New Orleans Nvidia will be talking about the guts behind the forthcoming Tesla 30 co-processors and related discrete GPU graphics cards. The GPUs were designed with 128, 256, and 512 cores in the first three generations, delivering 128, 240, and 448 working cores after the boogers in each chip were de-allocated.

It is a pretty safe bet that Nvidia is trying to cram 1,024 cores in its next GPU design, and based on current trends, where a successively larger percent of the cores don't make it, a fair guess is that 838 cores will be live in whatever future designs come out unless Taiwan Semiconductor Manufacturing Corp gets better yields on future processes than it is getting on current ones.

Adding so many cores on what I will call the Tesla 30 GPU co-processors probably means dropping the clock speed a bit, too. Maybe to somewhere around 500 MHz or so, depending on how hot the chip gets. If this is the case, then the single precision math on such a future Tesla 30 GPU co-processor would come in at 1.5 teraflops or so. If the clock speed can be pushed up to 600 MHz, that gets you about 1.8 teraflops.

Obviously, having more cores not be duds means you can drop the clock speed and still get the same flops. In an ideal universe, all 1,024 potential cores would run at 600 MHz and you'd get 2.2 teraflops. I am assuming that Nvidia can keep double-precision math to half the rate of single-precision math going forward, and I bet Tesla customers are, too. And if the naming conventions mean anything, then this future GPU co-processor will be called the C3040, with a C3060 variant with extra GDDR5 memory.

I'll take my Tesla 30s now, Nvidia. Thank you very much. Now let's talk about the Tesla 40s...

But seriously, there is one other interesting possibility that Nvidia could throw into some future generation of GPU co-processor, and it reminds me of an old joke: A man walks into a doctor's office with a chicken on his head and the chicken says, "Hey, doc, can you cut this idiot off my ass?"

Security for virtualized datacentres

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
No biggie: EMC's XtremIO firmware upgrade 'will wipe data'
But it'll have no impact and will be seamless, we're told
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
prev story


Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.