Feeds

The GPU tails wag the CPU dogs at Nvidia show

Where are the Tesla roadmaps?

Internet Security Threat Report 2014

Updated The Nvidia-sponsored 2010 GPU Technical Conference kicks off today in San Jose, California, and all of the key HPC players as well as some upstarts will be on hand to try to surf on the cresting wave of CPU-GPU hybrid computing models that will no doubt start taking over the HPC centers of the world and start moving out to our desktops and into corporate data centers in the coming years.

If you were expecting some insight into what Nvidia has cooking for the generation of GPU chips beyond the current "Fermi" chips that are used in the company's GeForce and Quadro graphics cards and Tesla co-processors, then you are going to be disappointed. Because according to Sumit Gupta, senior product manager of the Tesla line, Nvidia is not talking about roadmaps at the GPU Tech Conference.

And that is a damned shame, because now that the Fermi-based graphics cards, GPUs co-processors (both regular C2050 and C2070 PCI-Express cards and fanless M2050 and M2070 models for ceepie-geepie HPC clusters) are in the field, what everyone wants to really know is what Nvidia is going to do next.

There's plenty of chatter, of course, and it doesn't take a genius to figure out what Nvidia's next moves will be. First, there will be a process shrink that allows the company to get more flops out of a GPU, very likely offeringroughly twice the GPU cores and twice the oomph of the current machines.

The path is pretty plain. The first generation Tesla co-processor, the C870, debuted in the summer of 2007 with 128 cores running at 600 MHz, 1.5 GB of GDDR3 memory running at 1.6 GHz, and only offering single-precision floating point math. The C870, which burned 171 watts, was rated at 345.6 gigaflops. The second generation Tesla 10 GPU co-processors made their debut in November 2008 in the Tesla C1060, launching at the SC08 supercomputing conference, which had 240 cores running at 600 MHz, 4 GB of GDDR3 memory at the same 1.6 GHz speed, and the addition of double-precision math. The C1060 was rated at 622.1 gigaflops on single-precision math, but only 77.8 gigaflops on double precision.

With the Fermi GPUs at the heart of the current generation of Tesla 20 co-processors, Nvidia is shipping the C2050 and C2070, which have 448 cores running at 575 MHz and either 3 GB (C2050) or 6 GB (C2070) of GDDR5 memory running at a much faster 3 GHz. The Tesla 20 GPU co-processors offer more balanced floating point performance, with 1.03 teraflops of single-precision oomph and 515.2 gigaflops of double-precision number crunching. The Tesla 20s, which were announced in November 2009 at the SC09 conference, had the added extra goodie of ECC scrubbing on the GDDR5 memory inside the GPU co-processor - something that a lot of HPC workloads require and something that is missing from AMD's line of FireStream GPU co-processors.

A betting man would say that at SC10 this year in New Orleans Nvidia will be talking about the guts behind the forthcoming Tesla 30 co-processors and related discrete GPU graphics cards. The GPUs were designed with 128, 256, and 512 cores in the first three generations, delivering 128, 240, and 448 working cores after the boogers in each chip were de-allocated.

It is a pretty safe bet that Nvidia is trying to cram 1,024 cores in its next GPU design, and based on current trends, where a successively larger percent of the cores don't make it, a fair guess is that 838 cores will be live in whatever future designs come out unless Taiwan Semiconductor Manufacturing Corp gets better yields on future processes than it is getting on current ones.

Adding so many cores on what I will call the Tesla 30 GPU co-processors probably means dropping the clock speed a bit, too. Maybe to somewhere around 500 MHz or so, depending on how hot the chip gets. If this is the case, then the single precision math on such a future Tesla 30 GPU co-processor would come in at 1.5 teraflops or so. If the clock speed can be pushed up to 600 MHz, that gets you about 1.8 teraflops.

Obviously, having more cores not be duds means you can drop the clock speed and still get the same flops. In an ideal universe, all 1,024 potential cores would run at 600 MHz and you'd get 2.2 teraflops. I am assuming that Nvidia can keep double-precision math to half the rate of single-precision math going forward, and I bet Tesla customers are, too. And if the naming conventions mean anything, then this future GPU co-processor will be called the C3040, with a C3060 variant with extra GDDR5 memory.

I'll take my Tesla 30s now, Nvidia. Thank you very much. Now let's talk about the Tesla 40s...

But seriously, there is one other interesting possibility that Nvidia could throw into some future generation of GPU co-processor, and it reminds me of an old joke: A man walks into a doctor's office with a chicken on his head and the chicken says, "Hey, doc, can you cut this idiot off my ass?"

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
VMware's tool to harden virtual networks: a spreadsheet
NSX security guide lands in intriguing format
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.