Nvidia snaps out snappier Tesla GPU coprocessors

All fired up on all 512 cores

Securing Web Applications Made Simple and Scalable

GPU chipmaker Nvidia knows that it has to do more to grow its Tesla biz than slap some passive heat sinks on a fanless GPU card and talk up its CUDA parallel-programming tools. It has to keep delivering price/performance improvements, as well.

And that's exactly what it's doing with the new Tesla M2090 GPU coprocessor.

Back when the "Fermi" GPU chips were previewed at the SC2009 supercomputing event a year and a half ago, Nvidia showed off a chip with 512 cores, plus L1 and L2 cache memories for those cores (this was new) and ECC memory scrubbing (also new). The design bundled up 16 sets of 32 cores each into a streaming multiprocessor with 64KB of L1 cache, and a higher level L2 cache weighing in at 768KB that the cores can share.

That Fermi chip sported GDDR5 memory controllers, and the cards using the Fermi chips (either as discrete graphics cards or GPU coprocessors for accelerating floating point calculations) could have 3GB or 6GB of main memory. The memory controllers on the Fermi GPUs can address up to 1TB of memory, in theory.

But in the chip racket, theory does not always happen on the first iteration of a product, and so it was with the Fermi GPUs.

When the Fermi chips started shipping in the Tesla line of GPU coprocessors in May 2010, the initial Teslas had only 448 cores activated. Nvidia never explained this, but most people surmised that this had to do with yield issues (gunk on some cores in the chip) and the chips generating too much heat at a particular clock speed.

With those 448 cores running at 1.15GHz and GDDR5 memory chips running at 1.56GHz, the Tesla M2050 GPU coprocessor was rated at the 515 gigaflops of double-precision and 1.03 teraflops single-precision when performing floating-point operations.

The Tesla M2050 is a single-wide PCI-Express 2.0 device that has 3GB of GDDR5 memory, while the M2070 is a two-slot device that packs 6GB of memory and has the same floppish performance.

Both are rated at a top-end 225 watts of peak power draw, but Nvidia says the actual heat thrown off by the device is often a lot less and depends on the workload. That is a little bit less than 238 watts that the Tesla C2050 and C2070 coprocessors, which have fans built into them and which are aimed at goosing the number-crunching power of workstations to create a "personal supercomputer" – although these devices, too, are rated at the same 515 gigaflops of double-precision and 1.03 teraflops single-precision.

Sumit Gupta, senior product manager of the Tesla line at Nvidia, says that the Fermi GPUs used in the new M2090 coprocessors are not just a bin sort, looking for Fermis with more working cores or clocks that can run faster reliably. Nvidia has actually done a new tape-out of the Fermi design using Taiwan Semiconductor Manufacturing Corp's 40-nanometer processes, which Gupta says have some improvements that make chips run better.

When you add up some nips and tucks here and there on the Fermi chip plus the process improvements from TSMC, Nvidia can crank up the Fermi core clock speed by 13 per cent to 1.3GHz, and the GDDR5 memory speed by 18.6 per cent, to 1.85GHz, on the Tesla M2090.

Nvidia Tesla M2090 GPU

Nvidia's Tesla M2090 server GPU coprocessor

Those increases help performance considerably. And so does the fact that with the TSMC process improvement, Nvidia can now have all 512 cores in the Fermi design activated, which yields a theoretical 14.3 per percent improvement over those initial Fermi chips with only 448 active cores.

The Essential Guide to IT Transformation

Next page: Do the math

More from The Register

next story
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
Seagate chances ARM with NAS boxes for the SOHO crowd
There's an Atom-powered offering, too
Gartner: To the right, to the right – biz sync firms who've won in a box to the right...
Magic quadrant: Top marks for, er, completeness of vision, EMC
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Mobile application security vulnerability report
The alarming realities regarding the sheer number of applications vulnerable to attack, and the most common and easily addressable vulnerability errors.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.