Feeds

Deconstructing databases with Jim Gray

A genuine guru

7 Elements of Radically Simple OS Migration

Graphical Processor Units or GPUs

What are your views on using Graphical Processor Units (GPUs) for data processing?

Unbeknownst to most of us, the graphics community has been building vector co-processors and putting them into every laptop, every desktop. At present, the typical vector co-processor has about 40 times the CPU power - instruction per second power - of the CPU. It also has about ten times the memory bandwidth.

Intel and AMD talk about multi-core; well, a typical GPU has a hundred cores, a hundred pipelines and is doing all this texture mapping in parallel. So if you could figure out how to program the GPU you could go a lot faster, and the coda to this is that the GPU continues to double at a much faster rate of speed than CPUs. So it is incumbent upon us software guys to use the instructions where we find them, and we are beginning to find lots and lots of instructions in the GPU.

There is a Microsoft research project, which has published a technical report, I think two months ago, about Accelerator [Microsoft Tech Report 2005-184 - Ed], a general purpose programming language that extends C# and allows you to program array-oriented operations in video and ATI chips. The language is generic; you can program not knowing which GPU you've got.

This project was working with some people at the University of North Carolina and they are graphics guys who know how to program GPUs and they came up with a novel algorithm for sorting. So the general thing we are doing is going methodically through the problems and seeing if we can map those problems to GPUs. I'm currently looking at processing satellite imagery and much of the satellite image processing fits GPUs beautifully because it is fundamentally pixel programming. My main role in working on that paper was that, in doing something like sorting, you have to balance I/O and CPU and memory utilisation. So it's a system-wide design and I helped the people at North Carolina with making sure that the I/O could deliver data quickly enough so that we weren't disk-bound. We had to use several disks in order to keep the system busy because a disk only gives you data at about 50MB per second and the GPU can sort at about 500MB per second.

What kind of applications would the GPU processing would suit?

Crypto would be a good application. In addition, many of the large data applications; like mapping applications or sensor applications. Of course, the most natural thing is any kind of the processing of cameras so if you can get Photoshop recast using the GPU instead of the CPU it would run a lot faster. Fundamentally highly parallel "do this to all the pixels" applications are good.

Of course, this doesn't fit within the x86 instruction set and it requires writing algorithms in a very different way; you have to think in a very different way. The bitonic sort [see a general definition of this term here - Ed] that we used with the GPUs really is radically different from any sort anyone has ever thought of before. It is massively parallel. You have to think of a single instruction stream, multi-output, data-stream architecture, and most people are used to the ALGOL, COBOL single instruction stream, single data-stream idea [and, in my experience, only a few such programmers can reliably code parallel processes, as Gray says later - Ed]

Why is this taking so long to be accepted?

Remember the year 2000 problem? Looking back it's amazing people had to work so hard changing two digit to four digit dates. Many of us use libraries that already exist, many of us use algorithms that already exist, we all know that weighted sorting is pretty good and quick sort is pretty good. Then somebody comes along and says "No, no, no you've got to use bitonic sort and you've got to figure out how to program everything in parallel and incidentally the memory model is different and you just have to think differently, and oh, incidentally they vary from one machine to the next". So you say, "Oh, excuse me, how does this work on a server? I have a multi-threaded system and how does the GPU get shared among the multiple threads?"

There are a few details like that that haven't, quote "been worked out". About two weeks ago IBM came out with a cell architecture which is a fairly radical departure from...it's not just a GPU any more 48:42 it's an array of array processors and so we are seeing an architectural blossoming right now which could make life so confusing for people.

You can see why people end up saying, "You know, I'm a C programmer, I'm just going to stick to my good old C programs and I'm sure they'll be working in a few years and if I invest in one of these bizarre architectures they are likely to have evaporated and I'll have wasted all that effort."

Another problem is that people find it hard to think in parallel. If you read a cookbook, it tells you how to make things and there's parallelism inside the cookbook. They never tell you to do things at the same time...you know, each step is do this, do this, do this, it's all sequential. Knitting is another very sequential process.

Endpoint data privacy in the cloud is easier than you think

Next page: Real time analytics

More from The Register

next story
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cloudy CoreOS Linux distro declares itself production-ready
Lightweight, container-happy Linux gets first Stable release
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?