Original URL: http://www.theregister.co.uk/2011/12/01/letting_gpus_run_free/

Letting GPUs run free

No single system silo = big step forward

By Dan Olds, Gabriel Consulting

Posted in HPC, 1st December 2011 08:02 GMT

Blog One of the most interesting things I saw at SC11 was a joint Mellanox and University of Valencia demonstration of rCUDA over Infiniband. With rCUDA, applications can access a GPU (or multiple GPUs) on any other node in the cluster. It makes GPUs a sharable resource and is a big step towards making them as virtualisable (I don’t think that’s a word, but going to go with it anyway) as any other compute resource.

There aren’t a lot of details out there yet, there’s this press release from Mellanox and Valencia and this explanation of the rCUDA project.

This is a big deal. To me, the future of computing will be much more heterogeneous and hybrid than homogeneous and, well, some other word that means ‘common’ and begins with ‘H’. We’re moving into a mindset where systems are designed to handle particular workloads, rather than workloads that are modified to run sort of well on whatever systems are cheapest per pound or flop.

Properly designed workload-optimised systems, when running their target workload, are almost always more efficient (sometimes by orders of magnitude) than general purpose commodity systems. The efficiency delta makes them less expensive per unit of throughput and often more thrifty in terms of energy usage or floor space too. Hybrid systems combining GPUs and CPUs are examples of this type of workload optimisation.

While these hybrid boxes are making great headway in HPC, they’ve yet to make a lot of progress into the typical commercial data center. Not every company has an overwhelming aching need for the performance and efficiency that GPUs can provide, but many of them have at least a few workloads that can readily benefit from some speedy GPU goodness.

But as more organisations start to do heavy analytics or get deeper into CAD, EDA, etc, they’ll find other places where GPUs make sense. But without rCUDA (or something like it), they’ll have to purchase a GPU (or add one as an upgrade) to every system that is running these GPU-friendly workloads. With rCUDA, the GPU-system silo is opened up so that applications on many systems without GPUs can use the slack GPU capacity on other boxes.

Shared GPUs means higher utilisation rates, an even better price/performance ratio, and a stronger business case in favor of jumping into the hybrid system pool. It’s a good development and definitely one to watch. ®