Feeds

Core Wars: Inside Intel's power struggle with NVIDIA

Kepler takes Knights Corner?

Application security programs and practises

GPU Technology Conference Intel and NVIDIA are battling for the hearts and minds of developers in massively parallel computing.

Intel has been saying for years that concurrency rather than clock speed is the future of high performance computing, yet it has been slow to provide the mass of low-power, high-efficiency CPU cores needed to take full advantage of that insight.

Another angle on this is that GPUs are already designed for power-efficient massively parallel computing, and back in 2006 NVIDIA exploited its potential for general-purpose computing with its CUDA architecture, adding shared memory and other features to the GPU and providing supporting libraries and the CUDA SDK. CUDA is primarily a set of extensions to C, though there are wrappers for other languages.

jen_hsung_huang nvidia kepler gpu cuda

Huang's Tesla K20 will serve intense computing

At NVIDIA’s GPU Technology Conference in San Jose, California, last week, the company announced new editions of its Tesla GPU accelerator boards based on its “Kepler” architecture. These boards are designed for accelerating general-purpose computing rather than for driving displays. The Tesla K10, available now, has two Kepler GK104 GPUs, 3,072 cores in total, and performs at up to 4,577 gigaflops (2,288 gigaflops per GPU).

The Tesla K20, expected in the fourth quarter of 2012, uses two of the forthcoming Kepler GK110 GPU, which promises over 1,000 gigaflops double precision. “It’s intended for applications like computational fluid dynamics, finite element analysis, computational finance, physics, quantum chemistry, and so on,” explained chief executive Jen-Hsun Huang in his keynote speech.

Power efficiency, which is the true limitation on supercomputer performance, has also been a focus, and NVIDIA states a three times improvement in performance per watt, compared to the previous “Fermi” generation.

The not-yet-available K20 is really the one you want, and not only because of its better performance. Although both the GK104 and the GK110 are called Kepler, there are several key advances that only appear in the GK110. A Grid Management Unit in the GK110 enables a feature called Dynamic Parallelism, which means that the GPU can schedule its own work. Previously only the CPU could schedule work on the GPU. Dynamic Parallelism means that more code can run entirely on the GPU, for greater efficiency and simplified code.

Another GK110 advance is Hyper-Q, which provides 32 simultaneous connections between CPU and GPU, compared to just one in Fermi. The result is that multiple CPUs can launch work on the GPU simultaneously, greatly improving utilisation.

NVIDIA now projects that by 2014, 75 per cent of HPC customers will use GPUs for general purpose computing.

The rise of GPU computing must be troubling to Intel, especially as the focus on power efficiency raises interest in combining ARM CPUs with GPUs, though implementation is unlikely until we have 64-bit ARM on the market. Intel’s response is an initiative called Many Integrated Core (MIC, pronounced Mike). It has similarities with GPU computing, in that MIC boards are accelerator boards with their own memory, and developers need to understand that parts of an application will execute on the CPU, parts on MIC, and that data has to be copied between them.

Prototype Knights

Knights Ferry is the MIC prototype, available now to some Intel partners, and has 32 cores and up to 128 threads (four Hyper Threads per core). Knights Corner will be the production MIC and has more than 50 cores and over 200 threads. The processor in Knights Ferry, codenamed Aubrey Isle, is based on an older Pentium design for power efficiency, but includes over 100 additional x86 instructions including a Vector Processing Unit, important for many HPC applications. Knights Corner is expected in late 2012 or early 2013.

Intel is supporting MIC with its existing suite of tools for concurrent programming: Parallel Studio XE and Cluster Studio XE. Key components are Threading Building Blocks (TBB), a C++ template library, and Cilk Plus which extends C/C++ with keywords for task parallelism. Intel is also supporting OpenMP, a standardised set of directives for parallel programming, on MIC, though in doing so it is getting ahead of the standard since OpenMP does not yet support accelerators. Intel’s Math Kernel Library (MKL) will also be available for C and Fortran. OpenCL, a standard language for programming accelerators, will also be supported on MIC.

Eight steps to building an HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.