Feeds

Core Wars: Inside Intel's power struggle with NVIDIA

Kepler takes Knights Corner?

3 Big data security analytics techniques

GPU Technology Conference Intel and NVIDIA are battling for the hearts and minds of developers in massively parallel computing.

Intel has been saying for years that concurrency rather than clock speed is the future of high performance computing, yet it has been slow to provide the mass of low-power, high-efficiency CPU cores needed to take full advantage of that insight.

Another angle on this is that GPUs are already designed for power-efficient massively parallel computing, and back in 2006 NVIDIA exploited its potential for general-purpose computing with its CUDA architecture, adding shared memory and other features to the GPU and providing supporting libraries and the CUDA SDK. CUDA is primarily a set of extensions to C, though there are wrappers for other languages.

jen_hsung_huang nvidia kepler gpu cuda

Huang's Tesla K20 will serve intense computing

At NVIDIA’s GPU Technology Conference in San Jose, California, last week, the company announced new editions of its Tesla GPU accelerator boards based on its “Kepler” architecture. These boards are designed for accelerating general-purpose computing rather than for driving displays. The Tesla K10, available now, has two Kepler GK104 GPUs, 3,072 cores in total, and performs at up to 4,577 gigaflops (2,288 gigaflops per GPU).

The Tesla K20, expected in the fourth quarter of 2012, uses two of the forthcoming Kepler GK110 GPU, which promises over 1,000 gigaflops double precision. “It’s intended for applications like computational fluid dynamics, finite element analysis, computational finance, physics, quantum chemistry, and so on,” explained chief executive Jen-Hsun Huang in his keynote speech.

Power efficiency, which is the true limitation on supercomputer performance, has also been a focus, and NVIDIA states a three times improvement in performance per watt, compared to the previous “Fermi” generation.

The not-yet-available K20 is really the one you want, and not only because of its better performance. Although both the GK104 and the GK110 are called Kepler, there are several key advances that only appear in the GK110. A Grid Management Unit in the GK110 enables a feature called Dynamic Parallelism, which means that the GPU can schedule its own work. Previously only the CPU could schedule work on the GPU. Dynamic Parallelism means that more code can run entirely on the GPU, for greater efficiency and simplified code.

Another GK110 advance is Hyper-Q, which provides 32 simultaneous connections between CPU and GPU, compared to just one in Fermi. The result is that multiple CPUs can launch work on the GPU simultaneously, greatly improving utilisation.

NVIDIA now projects that by 2014, 75 per cent of HPC customers will use GPUs for general purpose computing.

The rise of GPU computing must be troubling to Intel, especially as the focus on power efficiency raises interest in combining ARM CPUs with GPUs, though implementation is unlikely until we have 64-bit ARM on the market. Intel’s response is an initiative called Many Integrated Core (MIC, pronounced Mike). It has similarities with GPU computing, in that MIC boards are accelerator boards with their own memory, and developers need to understand that parts of an application will execute on the CPU, parts on MIC, and that data has to be copied between them.

Prototype Knights

Knights Ferry is the MIC prototype, available now to some Intel partners, and has 32 cores and up to 128 threads (four Hyper Threads per core). Knights Corner will be the production MIC and has more than 50 cores and over 200 threads. The processor in Knights Ferry, codenamed Aubrey Isle, is based on an older Pentium design for power efficiency, but includes over 100 additional x86 instructions including a Vector Processing Unit, important for many HPC applications. Knights Corner is expected in late 2012 or early 2013.

Intel is supporting MIC with its existing suite of tools for concurrent programming: Parallel Studio XE and Cluster Studio XE. Key components are Threading Building Blocks (TBB), a C++ template library, and Cilk Plus which extends C/C++ with keywords for task parallelism. Intel is also supporting OpenMP, a standardised set of directives for parallel programming, on MIC, though in doing so it is getting ahead of the standard since OpenMP does not yet support accelerators. Intel’s Math Kernel Library (MKL) will also be available for C and Fortran. OpenCL, a standard language for programming accelerators, will also be supported on MIC.

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.