Nvidia snaps out snappier Tesla GPU coprocessors
All fired up on all 512 cores
GPU chipmaker Nvidia knows that it has to do more to grow its Tesla biz than slap some passive heat sinks on a fanless GPU card and talk up its CUDA parallel-programming tools. It has to keep delivering price/performance improvements, as well.
And that's exactly what it's doing with the new Tesla M2090 GPU coprocessor.
Back when the "Fermi" GPU chips were previewed at the SC2009 supercomputing event a year and a half ago, Nvidia showed off a chip with 512 cores, plus L1 and L2 cache memories for those cores (this was new) and ECC memory scrubbing (also new). The design bundled up 16 sets of 32 cores each into a streaming multiprocessor with 64KB of L1 cache, and a higher level L2 cache weighing in at 768KB that the cores can share.
That Fermi chip sported GDDR5 memory controllers, and the cards using the Fermi chips (either as discrete graphics cards or GPU coprocessors for accelerating floating point calculations) could have 3GB or 6GB of main memory. The memory controllers on the Fermi GPUs can address up to 1TB of memory, in theory.
But in the chip racket, theory does not always happen on the first iteration of a product, and so it was with the Fermi GPUs.
When the Fermi chips started shipping in the Tesla line of GPU coprocessors in May 2010, the initial Teslas had only 448 cores activated. Nvidia never explained this, but most people surmised that this had to do with yield issues (gunk on some cores in the chip) and the chips generating too much heat at a particular clock speed.
With those 448 cores running at 1.15GHz and GDDR5 memory chips running at 1.56GHz, the Tesla M2050 GPU coprocessor was rated at the 515 gigaflops of double-precision and 1.03 teraflops single-precision when performing floating-point operations.
The Tesla M2050 is a single-wide PCI-Express 2.0 device that has 3GB of GDDR5 memory, while the M2070 is a two-slot device that packs 6GB of memory and has the same floppish performance.
Both are rated at a top-end 225 watts of peak power draw, but Nvidia says the actual heat thrown off by the device is often a lot less and depends on the workload. That is a little bit less than 238 watts that the Tesla C2050 and C2070 coprocessors, which have fans built into them and which are aimed at goosing the number-crunching power of workstations to create a "personal supercomputer" – although these devices, too, are rated at the same 515 gigaflops of double-precision and 1.03 teraflops single-precision.
Sumit Gupta, senior product manager of the Tesla line at Nvidia, says that the Fermi GPUs used in the new M2090 coprocessors are not just a bin sort, looking for Fermis with more working cores or clocks that can run faster reliably. Nvidia has actually done a new tape-out of the Fermi design using Taiwan Semiconductor Manufacturing Corp's 40-nanometer processes, which Gupta says have some improvements that make chips run better.
When you add up some nips and tucks here and there on the Fermi chip plus the process improvements from TSMC, Nvidia can crank up the Fermi core clock speed by 13 per cent to 1.3GHz, and the GDDR5 memory speed by 18.6 per cent, to 1.85GHz, on the Tesla M2090.
Nvidia's Tesla M2090 server GPU coprocessor
Those increases help performance considerably. And so does the fact that with the TSMC process improvement, Nvidia can now have all 512 cores in the Fermi design activated, which yields a theoretical 14.3 per percent improvement over those initial Fermi chips with only 448 active cores.
Next page: Do the math
Re: WHERE exactly...
By the looks of it, right in to the copper slug covering the GPU.
As someone that's been toying with OpenCL...
..I'm impressed at the performance figures.
Now if they would just put some effort into fixing the damned OpenCL compiler and runtime everyone could be happy. The total lack of feedback from the compiler on about half of all compilation errors is annoying. The fact that OpenCL kernels often compile and run and do nothing when they have blatant programming errors also doesn't help.
Still, forefront of technology and all that....
What is that dang thang?
This card is one major reason nVidia is in trouble. Look at that radiator on the board. Heat and process technology is holding the green goblin from advancing past SpidyAMD in the graphics market. Since nV can't compete on the CPU front, they have turned to Tesla products (all over-hyped and more power hungry). However, nV has done more to advance lower cost HPC performance. Lest the ghost of Phys-X and Havok appear in the middle of the night, homage paid, notorious mention. CUDA and OpenCL are the future. However, AMD is lurking in the corner with Fusion.