Feeds

Deconstructing databases with Jim Gray

A genuine guru

Choosing a cloud hosting partner with confidence

Graphical Processor Units or GPUs

What are your views on using Graphical Processor Units (GPUs) for data processing?

Unbeknownst to most of us, the graphics community has been building vector co-processors and putting them into every laptop, every desktop. At present, the typical vector co-processor has about 40 times the CPU power - instruction per second power - of the CPU. It also has about ten times the memory bandwidth.

Intel and AMD talk about multi-core; well, a typical GPU has a hundred cores, a hundred pipelines and is doing all this texture mapping in parallel. So if you could figure out how to program the GPU you could go a lot faster, and the coda to this is that the GPU continues to double at a much faster rate of speed than CPUs. So it is incumbent upon us software guys to use the instructions where we find them, and we are beginning to find lots and lots of instructions in the GPU.

There is a Microsoft research project, which has published a technical report, I think two months ago, about Accelerator [Microsoft Tech Report 2005-184 - Ed], a general purpose programming language that extends C# and allows you to program array-oriented operations in video and ATI chips. The language is generic; you can program not knowing which GPU you've got.

This project was working with some people at the University of North Carolina and they are graphics guys who know how to program GPUs and they came up with a novel algorithm for sorting. So the general thing we are doing is going methodically through the problems and seeing if we can map those problems to GPUs. I'm currently looking at processing satellite imagery and much of the satellite image processing fits GPUs beautifully because it is fundamentally pixel programming. My main role in working on that paper was that, in doing something like sorting, you have to balance I/O and CPU and memory utilisation. So it's a system-wide design and I helped the people at North Carolina with making sure that the I/O could deliver data quickly enough so that we weren't disk-bound. We had to use several disks in order to keep the system busy because a disk only gives you data at about 50MB per second and the GPU can sort at about 500MB per second.

What kind of applications would the GPU processing would suit?

Crypto would be a good application. In addition, many of the large data applications; like mapping applications or sensor applications. Of course, the most natural thing is any kind of the processing of cameras so if you can get Photoshop recast using the GPU instead of the CPU it would run a lot faster. Fundamentally highly parallel "do this to all the pixels" applications are good.

Of course, this doesn't fit within the x86 instruction set and it requires writing algorithms in a very different way; you have to think in a very different way. The bitonic sort [see a general definition of this term here - Ed] that we used with the GPUs really is radically different from any sort anyone has ever thought of before. It is massively parallel. You have to think of a single instruction stream, multi-output, data-stream architecture, and most people are used to the ALGOL, COBOL single instruction stream, single data-stream idea [and, in my experience, only a few such programmers can reliably code parallel processes, as Gray says later - Ed]

Why is this taking so long to be accepted?

Remember the year 2000 problem? Looking back it's amazing people had to work so hard changing two digit to four digit dates. Many of us use libraries that already exist, many of us use algorithms that already exist, we all know that weighted sorting is pretty good and quick sort is pretty good. Then somebody comes along and says "No, no, no you've got to use bitonic sort and you've got to figure out how to program everything in parallel and incidentally the memory model is different and you just have to think differently, and oh, incidentally they vary from one machine to the next". So you say, "Oh, excuse me, how does this work on a server? I have a multi-threaded system and how does the GPU get shared among the multiple threads?"

There are a few details like that that haven't, quote "been worked out". About two weeks ago IBM came out with a cell architecture which is a fairly radical departure from...it's not just a GPU any more 48:42 it's an array of array processors and so we are seeing an architectural blossoming right now which could make life so confusing for people.

You can see why people end up saying, "You know, I'm a C programmer, I'm just going to stick to my good old C programs and I'm sure they'll be working in a few years and if I invest in one of these bizarre architectures they are likely to have evaporated and I'll have wasted all that effort."

Another problem is that people find it hard to think in parallel. If you read a cookbook, it tells you how to make things and there's parallelism inside the cookbook. They never tell you to do things at the same time...you know, each step is do this, do this, do this, it's all sequential. Knitting is another very sequential process.

Intelligent flash storage arrays

Next page: Real time analytics

More from The Register

next story
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Sway: Microsoft's new Office app doesn't have an Undo function
Content aggregation, meet the workplace ... oh
Sign off my IT project or I’ll PHONE your MUM
Honestly, it’s a piece of piss
Return of the Jedi – Apache reclaims web server crown
.london, .hamburg and .公司 - that's .com in Chinese - storm the web server charts
NetWare sales revive in China thanks to that man Snowden
If it ain't Microsoft, it's in fashion behind the Great Firewall
Chrome 38's new HTML tag support makes fatties FIT and SKINNIER
First browser to protect networks' bandwith using official spec
Admins! Never mind POODLE, there're NEW OpenSSL bugs to splat
Four new patches for open-source crypto libraries
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.