Core Wars: Inside Intel's power struggle with NVIDIA

Kepler takes Knights Corner?

7 Elements of Radically Simple OS Migration

GPU Technology Conference Intel and NVIDIA are battling for the hearts and minds of developers in massively parallel computing.

Intel has been saying for years that concurrency rather than clock speed is the future of high performance computing, yet it has been slow to provide the mass of low-power, high-efficiency CPU cores needed to take full advantage of that insight.

Another angle on this is that GPUs are already designed for power-efficient massively parallel computing, and back in 2006 NVIDIA exploited its potential for general-purpose computing with its CUDA architecture, adding shared memory and other features to the GPU and providing supporting libraries and the CUDA SDK. CUDA is primarily a set of extensions to C, though there are wrappers for other languages.

jen_hsung_huang nvidia kepler gpu cuda

Huang's Tesla K20 will serve intense computing

At NVIDIA’s GPU Technology Conference in San Jose, California, last week, the company announced new editions of its Tesla GPU accelerator boards based on its “Kepler” architecture. These boards are designed for accelerating general-purpose computing rather than for driving displays. The Tesla K10, available now, has two Kepler GK104 GPUs, 3,072 cores in total, and performs at up to 4,577 gigaflops (2,288 gigaflops per GPU).

The Tesla K20, expected in the fourth quarter of 2012, uses two of the forthcoming Kepler GK110 GPU, which promises over 1,000 gigaflops double precision. “It’s intended for applications like computational fluid dynamics, finite element analysis, computational finance, physics, quantum chemistry, and so on,” explained chief executive Jen-Hsun Huang in his keynote speech.

Power efficiency, which is the true limitation on supercomputer performance, has also been a focus, and NVIDIA states a three times improvement in performance per watt, compared to the previous “Fermi” generation.

The not-yet-available K20 is really the one you want, and not only because of its better performance. Although both the GK104 and the GK110 are called Kepler, there are several key advances that only appear in the GK110. A Grid Management Unit in the GK110 enables a feature called Dynamic Parallelism, which means that the GPU can schedule its own work. Previously only the CPU could schedule work on the GPU. Dynamic Parallelism means that more code can run entirely on the GPU, for greater efficiency and simplified code.

Another GK110 advance is Hyper-Q, which provides 32 simultaneous connections between CPU and GPU, compared to just one in Fermi. The result is that multiple CPUs can launch work on the GPU simultaneously, greatly improving utilisation.

NVIDIA now projects that by 2014, 75 per cent of HPC customers will use GPUs for general purpose computing.

The rise of GPU computing must be troubling to Intel, especially as the focus on power efficiency raises interest in combining ARM CPUs with GPUs, though implementation is unlikely until we have 64-bit ARM on the market. Intel’s response is an initiative called Many Integrated Core (MIC, pronounced Mike). It has similarities with GPU computing, in that MIC boards are accelerator boards with their own memory, and developers need to understand that parts of an application will execute on the CPU, parts on MIC, and that data has to be copied between them.

Prototype Knights

Knights Ferry is the MIC prototype, available now to some Intel partners, and has 32 cores and up to 128 threads (four Hyper Threads per core). Knights Corner will be the production MIC and has more than 50 cores and over 200 threads. The processor in Knights Ferry, codenamed Aubrey Isle, is based on an older Pentium design for power efficiency, but includes over 100 additional x86 instructions including a Vector Processing Unit, important for many HPC applications. Knights Corner is expected in late 2012 or early 2013.

Intel is supporting MIC with its existing suite of tools for concurrent programming: Parallel Studio XE and Cluster Studio XE. Key components are Threading Building Blocks (TBB), a C++ template library, and Cilk Plus which extends C/C++ with keywords for task parallelism. Intel is also supporting OpenMP, a standardised set of directives for parallel programming, on MIC, though in doing so it is getting ahead of the standard since OpenMP does not yet support accelerators. Intel’s Math Kernel Library (MKL) will also be available for C and Fortran. OpenCL, a standard language for programming accelerators, will also be supported on MIC.

Best practices for enterprise data

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
VMware builds product executables on 50 Mac Minis
And goes to the Genius Bar for support
Multipath TCP speeds up the internet so much that security breaks
Black Hat research says proposed protocol will bork network probes, flummox firewalls
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft's Euro cloud darkens: US FEDS can dig into foreign servers
They're not emails, they're business records, says court
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
prev story


7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?