Feeds

GPUs: Sharing good, islands bad

New words needed!

  • alert
  • submit to reddit

Build a business case: developing custom apps

GPU Video Blog I’ve talked to several folks at the 2010 GPU Tech Conference about the burgeoning need to be able to dynamically share GPUs across multiple systems without having to re-cable boxes or bog down the system by moving processing from one box to another. Dell put forward a solution with its C410x PCIe extension box that allows eight systems to be connected to 16 PCIe devices – including GPUs.

While this is a good thing and a solid first step, it doesn’t quite get us to the point where these devices can be used with the flexibility of, say, a printer or other network attached device. Having this capability is important because it opens up GPUs to a much wider set of users in both HPC and enterprise data centers. It makes them good cloud citizens, too.

On the last day of the show, I visited NextIO and found Kyle Geisler, who gave me an overview of how they’re separating the traditional server, with its CPUs and memory, from I/O. What they’ve done is build a box that hosts up to 16 PCIe devices, like GPUs, with connectivity to 24 individual servers. The devices don’t have to be GPUs; they could be Infiniband or Ethernet adapters or any other PCIe-based I/O gadget.

These devices can be devoted to any or none of the 24 attached servers, and attached and detached logically. NextIO has implemented hot plug PCIe on their system so that dynamic logical attach and detach isn’t a problem and doesn’t require reboots, or anything more than some clicks on the GUI management screen. But, as Kyle explains in the video, most customers are using APIs provided by NextIO to accomplish dynamic switching from within their programs or their own management stack.

The most recent news from NextIO is their introduction of a more bite-sized GPU solution with their vCORE Express 2070 product. It’s a 1U chassis that holds up to 4 Tesla M2050 or M2070 GPUs. It’s a GPU starter solution for customers who have modest needs or are looking for an easy (and less expensive) first step into GPU computing.

The NextIO product line (and other products that are on the way) are a big step toward virtualized GPUs, but we’re not quite there yet. The GPUs can be devoted to servers and hot switched between systems, but they can’t be shared in the same way as a typical general-purpose processor. I think that we’ll see more advances on this front sooner rather than later, particularly as GPUs find their way deeper into the enterprise space.

As I’ve often said in these blogs, the biggest single trend I see happening in business is a move toward much greater use of analytics, particularly predictive analytics. My most complete rant on this topic is here, for anyone wanting to see my rationale.

If I’m right about the trend, then there’s a whole new world of pain coming at data centers as business units demand more data and the computing power to crunch it in wildly varying ways – oh, and fast enough gear to provide the results in near-real time. I think that GPUs will be a key component in enterprise analytics infrastructures. They’re very fast on this type of work, and the ecosystem has come a long way in just a few years. We’ll soon be at the point where there is vendor support for most of the analytic routines that a business would want to run.

I think that enterprises are going to adopt GPUs in piecemeal fashion. I don’t see most companies buying huge analytic infrastructures in a single swipe; rather they will add this capability over time on a project-by-project basis. The ability to make GPUs a shared resource will make justifying them – and the additional investment in time and code to utilize them – an easier decision to push through the organization.

In this final video from the GPU technical conference, it’s also noteworthy to see how my term “GPU-riffic” has come into common usage. With only 10 minutes of cajoling, Kyle and his PR representative were talked into actually saying it live for use at the tail end of the video.

While I can put together a good case for my former boss and myself being the first to use the term “server sprawl,” I don’t have much in the way of proof other than some old slides from 1996. With “GPU-riffic,” it’s different: I have evidence that I’m the first guy to say it. It’s annoying and it’s moronic and it’s mine, damn it! ®

Boost IT visibility and business value

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Cutting cancer rates: Data, models and a happy ending?
How surgery might be making cancer prognoses worse
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
VMware's high-wire balancing act: EVO might drag us ALL down
Get it right, EMC, or there'll be STORAGE CIVIL WAR. Mark my words
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Scale data protection with your virtual environment
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify current features.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?