Nvidia launches not one but two Kepler2 GPU coprocessors

Uncloaks Tesla K20, K20X extreme oomphers for servers, workstations

Boost IT visibility and business value

Putting the K20X through the HPC paces

To test the relative efficiency of the Fermi and Kepler generations of GPUs, Nvidia grabbed a two-socket server with two Intel Xeon E5-2680 processors spinning at 2.7GHz, and dropped in two Fermi M2090 GPU coprocessors and then ran the Linpack Fortran benchmark on the box.

This setup delivers 1.03 teraflops of sustained Linpack performance, with a computational efficiency of 61 per cent – meaning that 39 per cent of the aggregate double-precision floating point performance of the system went up the chimney.

Nvidia then took the same server and yanked out the M2090s and slotted in two K20X coprocessors. The server was able to deliver 2.25 teraflops of sustained Linpack performance, and not just because the K20X is more powerful, but because the K20X is more efficient. In fact, 76 per cent of the aggregate performance in the server is actually brought to bear on the Linpack test thanks to the architectural changes in the Tesla K20 series of coprocessors.

How the K20 stacks up against a Xeon E5 and a Fermi GPU

How the K20 stacks up against a Xeon E5 and a Fermi GPU

In another test to show how the GPU coprocessors stack up against – rather than with – Intel Xeons, Nvidia fired up the DGEMM double-precision matrix math benchmark on an eight-core Xeon E5-2687, which is the 3.1GHz chip made for workstations, which was able to do 170 gigaflops.

A Fermi-based M2090 could do 430 gigaflops, and the Kepler-based K20X could do 1.22 teraflops. This test is important in that the DGEMM test is what Intel used to show a prototype Xeon Phi x86-based parallel coprocessor breaking through 1 teraflops on a single prototype card a year ago at SC11.

The K20 versus Xeons on various scientific apps

The K20 versus Xeons on various scientific apps

GPU accelerators are not just about servers, but also about workstations. Nvidia has spent some time in the labs running real workloads on Xeon or Core i7 workstations and seeing what happens when Tesla K20 or K20X coprocessors are added to the workstation.

On the MATLAB application shown in the chart above, a workstation with one i7-2600K processor ran some fast Fourier transform (FFT) routines, and then the same routines were run after slapping in a Tesla K20 coprocessor. The speedup was a factor of 18 because the MATLAB software speaks CUDA and the work lends itself to offloading to the GPU.

For the other tests, Nvidia used a workstation with two top-bin E5-2687W processors paired with two Tesla K20X chips, and the speedup for various applications ranged from a low factor of 8X to a high of 32X.

Adding K20X coprocessors to Cray supers speeds up apps big time

Adding K20X coprocessors to Cray supers speeds up apps big time

Nvidia and supercomputer partner Cray are obviously very keen to demonstrate that packaged applications can scale across hundreds or thousands of server nodes equipped with GPU accelerators, and chose to pit the QMCPACK materials-science application and the NAMD molecular-dynamics application through the paces on a Cray XK7 system both with and without K20X GPU accelerators installed.

The tests show that the GPU accelerators not only can speed up calculations with these two applications, but that as you boost the server node count in the XK7 machine – which uses Cray's "Gemini" 3D torus interconnect to hook nodes to each other – the performance of the ceepie-geepie box scales further faster than the CPU-only machine.

Gupta says that Nvidia has already shipped 30 petaflops worth of Tesla K20 and K20X coprocessors in the past 30 days. The K20 card will be available through workstation and server makers and through the retail channel where you normally buy graphics cards and other gear. The K20X, which is a fanless design, will be like the Tesla M2090 fanless coprocessor before it, and will only be available through server OEMs who tweak their machines to allow them to do the cooling for the Kepler cards. The channel will be getting K20 cards from Nvidia in the middle of this month in volume, with the server OEMs having the K20X cards available in November or December, depending on the OEM.

Pricing is not available on the new units, but El Reg estimates that the Tesla K20 card probably costs something on the order of $3,000 to $3,500 street, with the K20X commanding perhaps $500 to $1,000 more than that. At that price, two K20X cards will just about triple the cost of a server node, but will offer considerably more performance on workloads, as you can see from the data above. ®

Boost IT visibility and business value

More from The Register

next story
Pay to play: The hidden cost of software defined everything
Enter credit card details if you want that system you bought to actually be useful
HP busts out new ProLiant Gen9 servers
Think those are cool? Wait till you get a load of our racks
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Community chest: Storage firms need to pay open-source debts
Samba implementation? Time to get some devs on the job
Like condoms, data now comes in big and HUGE sizes
Linux Foundation lights a fire under storage devs with new conference
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
prev story


Gartner critical capabilities for enterprise endpoint backup
Learn why inSync received the highest overall rating from Druva and is the top choice for the mobile workforce.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.