Nvidia shows off first 'Kepler' GPUs

PCs first, server GPU coprocessors in Q3

High performance access to file storage

Updated Graphics chip and PC and server processor wannabe Nvidia is lifting the skirt a bit on its next-generation "Kepler" graphics processing units today as it starts talking about the feeds and speeds of its new GeForce graphics cards for desktop and notebook PCs.

As Nvidia co-founder and CEO Jen-Hsun Huang explained when he outed the roadmap for the Kepler GPUs (originally slated for late 2011) and the "Maxwell" follow-ons due in 2013, Nvidia is focused like a laser on performance per watt, not just performance, for its GPU chips. This is because heat, more than any other factor, is the gating issue deciding where GPUs can be adopted and where they cannot.

The promise that Huang made back in September 2010 was that by shifting to a new design and moving to a 28 nanometer wafer-baking process at foundry partner Taiwan Semiconductor Manufacturing Corp, Nvidia could deliver somewhere on the order of three to four times the double-precision floating point operations per watt of the current "Fermi" GPUs, which are used in GeForce graphics cards for PCs, Quadro GPUs for workstations, and Tesla server coprocessors alike. And the shift to Maxwell in 2013 is supposed to deliver 16 times more double-precision flops per watt as the Fermis.

Nvidia Kepler GeForce GTX 680 graphics card

Nvidia's GeForce GTX 680 graphics card

That's a pretty tall order, and one that Nvidia has not had an easy time filling, with TSMC's ramp on 28 nanometer processes being steeper than expected. But today, with the unveiling of the GeForce GTX 680 for PCs and the GT 640M for mobile PCs, Nvidia is trying to prove to potential OEM customers that build PCs and notebooks – as well as the end users who will buy them – that it has the speed they crave.

The data is a bit thin as El Reg goes to press, but here's what Sumit Gupta, senior product manager of the Tesla line at Nvidia, told us ahead of the skirt-raising today. As is the case with CPU manufacturers, Nvidia is scaling back the clock speed on the cores in the Kepler GPU while jacking up the number of cores to get more performance and even more performance per watt. Performance scales more or less linearly (okay, less) with the number of cores on a CPU or GPU, but power consumption and heat dissipation go up exponentially with clock speed. So a small reduction in clock speed can mean a lot, and then you can use a process shrink, like Nvidia's move from 40 nanometer to 28 nanometer processes, to cram more cores onto the die and thereby boost the performance per watt and the raw performance, too.

To keep things straight between the PCs and the servers, El Reg had Gupta dub the one used in GeForce PC GPUs "Kepler1" because it will have a different design from the one used in Telsa server coprocessors at the heart of a number of very large and powerful supercomputers later this year. We'll call that one "Kepler2", which will have a heavy dose of double-precision floating point processing as well as more memory, ECC scrubbing on the memory, different packaging aimed at servers, and a higher price tag.

The Kepler1 GPU used in the GeForce GTX 680 graphics card will have 1,536 CUDA cores, which will run at 1006MHz and will have a turbo boost speed of 1058MHz. This card has 2GB of GDDR5 graphics memory with a 256-bit path to memory running at 6Gb/sec. The card will have two 6-pin power connectors and will have two DVI ports and one HDMI port, and most significantly, will slide into PCI-Express 3.0 peripheral slots coming with the "Ivy Bridge" family of Core processors from Intel.

With the Fermi designs, the GPU had 512 cores, with 64KB of L1 cache per core added for the first time to the CUDA cores and a 768KB L2 cache shared across a group of 32 cores known as streaming multiprocessors, or SMs for short. The Fermi had 16 of these SMs and either 3GB or 6GB of GDDR5 memory. The initial Fermis only shipped with 448 cores activated in the top-end models, due to the typical yield issues that all chip makers face. The Fermis weighed in at between 225 watts and 250 watts in a discrete graphics card and Tesla coprocessor, and originally ran at 1.15GHz and were boosted to 1.3GHz.

The new Kepler GPU puts 192 cores into a "streaming multiprocessor extreme" with a slightly modified CUDA core, according to Gupta. Eight of these SMX units are on the GPU for a total of 1,536 cores. For whatever reason, Nvidia is not releasing any single-precision or double-precision floating point performance figures yet on the Kepler GPUs, but says that the new SMX module offers twice the performance per watt of the prior Fermi SM unit, and because a card only burns 195 watts, it offers much better performance per watt.

For the gamers out there, it might take three of the GeForce GTX 580 graphics cards, which burned 732 watts, to play the Samaratin video game. But now, Nvidia is claiming that you can get the same performance with only one GeForce GTX 680 video card, and this will only burn 195 watts. No word on what the pricing will be, but the GTX 680 will almost certainly cost more than a single GTX 580 – particularly with the 28 nanometer wafers coming out of TSMC being in short supply.

The main thing as far as Nvidia is concerned is that the GTX 680 offers anywhere from 1.2 to 1.6 times the performance of rival Advanced Micro Devices' HD7970 graphics card.

On the notebook front, Nvidia is talking a little bit about the GeForce GT 640M mobile GPU, and is bashing Intel's integrated HD3000 graphics card because it can't do better than 20 frames per second playing all the popular high-res games out there – making ultrabooks not so ultra. But you can get more than 30 frames per second with the GT 640M, says Gupta, which is twice as power-efficient as the GT 580M it replaces.

The upshot is that if you hold the performance of the notebook steady on a composite of commercial and game benchmark tests, a notebook from early 2010 with e GeForce GTX 285M card weighed in at 12 pounds, was 60mm thick, and had two hours of battery life. By this time last year, you could get a notebook with a GTX 460M and it weighed 9 pounds, was 50mm thick, and had three hours of battery life running the benchmarks. With this year's ultrabooks – in this case, an Acer Timeline Ultra M3 – it weighs 5 pounds, is 20mm thick and has 8 hours of battery life running the composite benchmarks.

And yes, the Kepler1 GPUs can play Crysis 2...

So that leaves us with the Kepler2 GPUs. Gupta says that these are still on track to ship in Tesla GPU coprocessors to Oak Ridge National Laboratory for its "Titan" supercomputer and to the University of Illinois for its "Blue Waters" big bad box in the third quarter. Volume shipments of the server coprocessors bearing the Kepler2 GPUs will start in the fourth quarter of this year. These Kepler2 GPUs will have three times the performance per watt of the top-end Fermi coprocessors today.

"With Tesla, everything is larger and more," says Gupta. But he declined to give any specific details.

Update: After El Reg went to press, Nvidia put a few more details out about the new Kepler-based GPUs. First the GTX 680 is expected to have a street price of $499 and will be available through the usual suspects. ASUS, Colorful, EVGA, Gainward, Galaxy, Gigabyte, Innovision 3D, MSI, Palit, Point of View, PNY, and Zotac were called out specifically by Nvidia.

Here's a die shot of the new Kepler GPU, which has 3.54 billion transistors:

Applied Micro X-Gene ARM block diagram

Die shot of the Nvidia Kepler GPU (click to enlarge)

Some of the finer points of the performance features of the GTX 680 graphics card are in this blog post. The thing that jumps out, of course, is that the GTX 680 is not as fast as a lot of users would like, but because it runs cooler than the GTX 580--and often a lot cooler in real-world situations--you can double or triple up the graphics cards to get a lot more performance. That's Nvidia's way of saying the laws of thermodynamics are forcing it to make it up in volume.

Nvidia also said that in addition to Acer, Asus, Dell, Hewlett-Packard, Lenovo, LG, Samsung, Sony, and Toshiba all have plans to use the GeForce 600M family of mobile GPUs--not necessarily the GT 640M, mind you--in their ultrabook designs.

In a different blog post, Nvidia laid out the SKUs for the GT 600M series:

Nvidia Kepler GT 600M GPUs

Nvidia Kepler GT 600M GPUs (click to enlarge)

Here's what the higher end GTX 600M models look like:

Nvidia Kepler GTX 600M GPUs

X Marks the mobile spot with the Kepler GTX 600M GPUs (click to enlarge)

As you can see, Nvidia is scaling up the number of cores, the clock speeds, the memory interface width, and graphics memory bandwidth to hit ten different performance points, and presumably ten different price points, for notebooks. Pricing on these embedded GPUs were not announced. ®

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
USA opposes 'Schengen cloud' Eurocentric routing plan
All routes should transit America, apparently
prev story


Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.