Nvidia launches not one but two Kepler2 GPU coprocessors

Uncloaks Tesla K20, K20X extreme oomphers for servers, workstations

Boost IT visibility and business value

Putting the K20X through the HPC paces

To test the relative efficiency of the Fermi and Kepler generations of GPUs, Nvidia grabbed a two-socket server with two Intel Xeon E5-2680 processors spinning at 2.7GHz, and dropped in two Fermi M2090 GPU coprocessors and then ran the Linpack Fortran benchmark on the box.

This setup delivers 1.03 teraflops of sustained Linpack performance, with a computational efficiency of 61 per cent – meaning that 39 per cent of the aggregate double-precision floating point performance of the system went up the chimney.

Nvidia then took the same server and yanked out the M2090s and slotted in two K20X coprocessors. The server was able to deliver 2.25 teraflops of sustained Linpack performance, and not just because the K20X is more powerful, but because the K20X is more efficient. In fact, 76 per cent of the aggregate performance in the server is actually brought to bear on the Linpack test thanks to the architectural changes in the Tesla K20 series of coprocessors.

How the K20 stacks up against a Xeon E5 and a Fermi GPU

How the K20 stacks up against a Xeon E5 and a Fermi GPU

In another test to show how the GPU coprocessors stack up against – rather than with – Intel Xeons, Nvidia fired up the DGEMM double-precision matrix math benchmark on an eight-core Xeon E5-2687, which is the 3.1GHz chip made for workstations, which was able to do 170 gigaflops.

A Fermi-based M2090 could do 430 gigaflops, and the Kepler-based K20X could do 1.22 teraflops. This test is important in that the DGEMM test is what Intel used to show a prototype Xeon Phi x86-based parallel coprocessor breaking through 1 teraflops on a single prototype card a year ago at SC11.

The K20 versus Xeons on various scientific apps

The K20 versus Xeons on various scientific apps

GPU accelerators are not just about servers, but also about workstations. Nvidia has spent some time in the labs running real workloads on Xeon or Core i7 workstations and seeing what happens when Tesla K20 or K20X coprocessors are added to the workstation.

On the MATLAB application shown in the chart above, a workstation with one i7-2600K processor ran some fast Fourier transform (FFT) routines, and then the same routines were run after slapping in a Tesla K20 coprocessor. The speedup was a factor of 18 because the MATLAB software speaks CUDA and the work lends itself to offloading to the GPU.

For the other tests, Nvidia used a workstation with two top-bin E5-2687W processors paired with two Tesla K20X chips, and the speedup for various applications ranged from a low factor of 8X to a high of 32X.

Adding K20X coprocessors to Cray supers speeds up apps big time

Adding K20X coprocessors to Cray supers speeds up apps big time

Nvidia and supercomputer partner Cray are obviously very keen to demonstrate that packaged applications can scale across hundreds or thousands of server nodes equipped with GPU accelerators, and chose to pit the QMCPACK materials-science application and the NAMD molecular-dynamics application through the paces on a Cray XK7 system both with and without K20X GPU accelerators installed.

The tests show that the GPU accelerators not only can speed up calculations with these two applications, but that as you boost the server node count in the XK7 machine – which uses Cray's "Gemini" 3D torus interconnect to hook nodes to each other – the performance of the ceepie-geepie box scales further faster than the CPU-only machine.

Gupta says that Nvidia has already shipped 30 petaflops worth of Tesla K20 and K20X coprocessors in the past 30 days. The K20 card will be available through workstation and server makers and through the retail channel where you normally buy graphics cards and other gear. The K20X, which is a fanless design, will be like the Tesla M2090 fanless coprocessor before it, and will only be available through server OEMs who tweak their machines to allow them to do the cooling for the Kepler cards. The channel will be getting K20 cards from Nvidia in the middle of this month in volume, with the server OEMs having the K20X cards available in November or December, depending on the OEM.

Pricing is not available on the new units, but El Reg estimates that the Tesla K20 card probably costs something on the order of $3,000 to $3,500 street, with the K20X commanding perhaps $500 to $1,000 more than that. At that price, two K20X cards will just about triple the cost of a server node, but will offer considerably more performance on workloads, as you can see from the data above. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.