Feeds

Deconstructing databases with Jim Gray

A genuine guru

New hybrid storage solutions

Graphical Processor Units or GPUs

What are your views on using Graphical Processor Units (GPUs) for data processing?

Unbeknownst to most of us, the graphics community has been building vector co-processors and putting them into every laptop, every desktop. At present, the typical vector co-processor has about 40 times the CPU power - instruction per second power - of the CPU. It also has about ten times the memory bandwidth.

Intel and AMD talk about multi-core; well, a typical GPU has a hundred cores, a hundred pipelines and is doing all this texture mapping in parallel. So if you could figure out how to program the GPU you could go a lot faster, and the coda to this is that the GPU continues to double at a much faster rate of speed than CPUs. So it is incumbent upon us software guys to use the instructions where we find them, and we are beginning to find lots and lots of instructions in the GPU.

There is a Microsoft research project, which has published a technical report, I think two months ago, about Accelerator [Microsoft Tech Report 2005-184 - Ed], a general purpose programming language that extends C# and allows you to program array-oriented operations in video and ATI chips. The language is generic; you can program not knowing which GPU you've got.

This project was working with some people at the University of North Carolina and they are graphics guys who know how to program GPUs and they came up with a novel algorithm for sorting. So the general thing we are doing is going methodically through the problems and seeing if we can map those problems to GPUs. I'm currently looking at processing satellite imagery and much of the satellite image processing fits GPUs beautifully because it is fundamentally pixel programming. My main role in working on that paper was that, in doing something like sorting, you have to balance I/O and CPU and memory utilisation. So it's a system-wide design and I helped the people at North Carolina with making sure that the I/O could deliver data quickly enough so that we weren't disk-bound. We had to use several disks in order to keep the system busy because a disk only gives you data at about 50MB per second and the GPU can sort at about 500MB per second.

What kind of applications would the GPU processing would suit?

Crypto would be a good application. In addition, many of the large data applications; like mapping applications or sensor applications. Of course, the most natural thing is any kind of the processing of cameras so if you can get Photoshop recast using the GPU instead of the CPU it would run a lot faster. Fundamentally highly parallel "do this to all the pixels" applications are good.

Of course, this doesn't fit within the x86 instruction set and it requires writing algorithms in a very different way; you have to think in a very different way. The bitonic sort [see a general definition of this term here - Ed] that we used with the GPUs really is radically different from any sort anyone has ever thought of before. It is massively parallel. You have to think of a single instruction stream, multi-output, data-stream architecture, and most people are used to the ALGOL, COBOL single instruction stream, single data-stream idea [and, in my experience, only a few such programmers can reliably code parallel processes, as Gray says later - Ed]

Why is this taking so long to be accepted?

Remember the year 2000 problem? Looking back it's amazing people had to work so hard changing two digit to four digit dates. Many of us use libraries that already exist, many of us use algorithms that already exist, we all know that weighted sorting is pretty good and quick sort is pretty good. Then somebody comes along and says "No, no, no you've got to use bitonic sort and you've got to figure out how to program everything in parallel and incidentally the memory model is different and you just have to think differently, and oh, incidentally they vary from one machine to the next". So you say, "Oh, excuse me, how does this work on a server? I have a multi-threaded system and how does the GPU get shared among the multiple threads?"

There are a few details like that that haven't, quote "been worked out". About two weeks ago IBM came out with a cell architecture which is a fairly radical departure from...it's not just a GPU any more 48:42 it's an array of array processors and so we are seeing an architectural blossoming right now which could make life so confusing for people.

You can see why people end up saying, "You know, I'm a C programmer, I'm just going to stick to my good old C programs and I'm sure they'll be working in a few years and if I invest in one of these bizarre architectures they are likely to have evaporated and I'll have wasted all that effort."

Another problem is that people find it hard to think in parallel. If you read a cookbook, it tells you how to make things and there's parallelism inside the cookbook. They never tell you to do things at the same time...you know, each step is do this, do this, do this, it's all sequential. Knitting is another very sequential process.

Reducing the cost and complexity of web vulnerability management

Next page: Real time analytics

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Profitless Twitter: We're looking to raise $1.5... yes, billion
We'll spend the dosh on transactions, biz stuff 'n' sh*t
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.