Feeds

GPUs: Sharing good, islands bad

New words needed!

  • alert
  • submit to reddit

Beginner's guide to SSL certificates

GPU Video Blog I’ve talked to several folks at the 2010 GPU Tech Conference about the burgeoning need to be able to dynamically share GPUs across multiple systems without having to re-cable boxes or bog down the system by moving processing from one box to another. Dell put forward a solution with its C410x PCIe extension box that allows eight systems to be connected to 16 PCIe devices – including GPUs.

While this is a good thing and a solid first step, it doesn’t quite get us to the point where these devices can be used with the flexibility of, say, a printer or other network attached device. Having this capability is important because it opens up GPUs to a much wider set of users in both HPC and enterprise data centers. It makes them good cloud citizens, too.

On the last day of the show, I visited NextIO and found Kyle Geisler, who gave me an overview of how they’re separating the traditional server, with its CPUs and memory, from I/O. What they’ve done is build a box that hosts up to 16 PCIe devices, like GPUs, with connectivity to 24 individual servers. The devices don’t have to be GPUs; they could be Infiniband or Ethernet adapters or any other PCIe-based I/O gadget.

These devices can be devoted to any or none of the 24 attached servers, and attached and detached logically. NextIO has implemented hot plug PCIe on their system so that dynamic logical attach and detach isn’t a problem and doesn’t require reboots, or anything more than some clicks on the GUI management screen. But, as Kyle explains in the video, most customers are using APIs provided by NextIO to accomplish dynamic switching from within their programs or their own management stack.

The most recent news from NextIO is their introduction of a more bite-sized GPU solution with their vCORE Express 2070 product. It’s a 1U chassis that holds up to 4 Tesla M2050 or M2070 GPUs. It’s a GPU starter solution for customers who have modest needs or are looking for an easy (and less expensive) first step into GPU computing.

The NextIO product line (and other products that are on the way) are a big step toward virtualized GPUs, but we’re not quite there yet. The GPUs can be devoted to servers and hot switched between systems, but they can’t be shared in the same way as a typical general-purpose processor. I think that we’ll see more advances on this front sooner rather than later, particularly as GPUs find their way deeper into the enterprise space.

As I’ve often said in these blogs, the biggest single trend I see happening in business is a move toward much greater use of analytics, particularly predictive analytics. My most complete rant on this topic is here, for anyone wanting to see my rationale.

If I’m right about the trend, then there’s a whole new world of pain coming at data centers as business units demand more data and the computing power to crunch it in wildly varying ways – oh, and fast enough gear to provide the results in near-real time. I think that GPUs will be a key component in enterprise analytics infrastructures. They’re very fast on this type of work, and the ecosystem has come a long way in just a few years. We’ll soon be at the point where there is vendor support for most of the analytic routines that a business would want to run.

I think that enterprises are going to adopt GPUs in piecemeal fashion. I don’t see most companies buying huge analytic infrastructures in a single swipe; rather they will add this capability over time on a project-by-project basis. The ability to make GPUs a shared resource will make justifying them – and the additional investment in time and code to utilize them – an easier decision to push through the organization.

In this final video from the GPU technical conference, it’s also noteworthy to see how my term “GPU-riffic” has come into common usage. With only 10 minutes of cajoling, Kyle and his PR representative were talked into actually saying it live for use at the tail end of the video.

While I can put together a good case for my former boss and myself being the first to use the term “server sprawl,” I don’t have much in the way of proof other than some old slides from 1996. With “GPU-riffic,” it’s different: I have evidence that I’m the first guy to say it. It’s annoying and it’s moronic and it’s mine, damn it! ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
IT crisis looming: 'What if AWS goes pop, runs out of cash?'
Public IaaS... something's gotta give - and it may be AWS
Linux? Bah! Red Hat has its eye on the CLOUD – and it wants to own it
CEO says it will be 'undisputed leader' in enterprise cloud tech
BT claims almost-gigabit connections over COPPER WIRE
Just need to bring the fibre box within 19m ...
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Hey, what's a STORAGE company doing working on Internet-of-Cars?
Boo - it's not a terabyte car, it's just predictive maintenance and that
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.