Nvidia stretches Tesla GPU coprocessors from HPC to big data

'Anything a CPU can do, a GPU can do better'

7 Elements of Radically Simple OS Migration

GTC 2013 Graphics chip maker Nvidia has barely begun to put a dent in the traditional high performance computing segment with its Tesla GPU coprocessors and it is already gearing up to take on new markets. The next target is big data, and as with parallel supercomputing, Nvidia is hoping to get the jump on rivals Intel and AMD, which peddle their respective x86 and GPU coprocessors.

As it turns out, a GPU is not just good at doing floating point math in single precision or double precision, but it can also be used to sift through streams of data to index it and sort it. It takes a bit of work to repurpose these jobs for a GPU, which have been run on CPUs for the most part, but Nvidia is seeing more and more companies using its Tesla GPUs to augment the indexing, sorting, and otherwise chewing of large data sets.

This will be on of the themes that Jen-Hsun Huang, co-founder and CEO at Nvidia, covered during his opening keynote at the GPU Technology Conference in San Jose on Tuesday. Sumit Gupta, general manager of the Tesla Accelerated Computing business unit at Nvidia, gave El Reg some examples of the kind of adjacent big data jobs where GPUs are being deployed.

The first example comes from SaaSy CRM software vendor Salesforce.com. As it turns out, Salesforce.com is one of the six companies in the world that has full access to the Twitter firehose. The number of tweets per day was miniscule back in 2007 through 2009, but in 2010 it jumped to about 50 million tweets per day, busted through 200 million per day in 2011, and broke through 500 million last year.

If you want to chew on all that data to do sentiment analysis for CRM customers, as Salesforce.com most certainly wants to do, then you would need at least ten times the iron you needed three years ago to get the job done.

Just like sticking with regular server processors is not practical in terms of cost, bang for the buck, or performance per watt for a lot of traditional HPC shops, Salesforce.com found the same issues in the big data munchers it had to store and index the raw Twitter feed.

For one thing, so-called real-time text searches on its CPU clusters holding the Twitter data were taking up to ten minutes to complete, which is better than running in batch mode, but cannot be considered real time.

So Salesforce.com engineers figured out how to index the raw Twitter feed using GPUs to accelerate the CPUs, and then also uses GPU offload to match a search term against that index to create baby Twitter feeds that can in turn be pumped over to customers and meshed with their CRM apps. With the GPU assist, Salesforce.com has been able to reduce its Twitter text search down to one second or less, and that is basically real-time.

Salesforce.com is not, however, talking about the infrastructure that makes up its Twitter search engine. Marc Benioff would have to kill you if you found out. The odds favor some kind of NoSQL data store running on an x86 cluster, of course.

The Shazam music identification service has also shifted to a ceepie-geepie architecture to sort through the digital fingerprints that it creates to identify songs from any snippet anywhere in the song. The move to GPUs was precipitated by a trebling of searches and records in the past year.

In 2011, the company supported 100 million user inquiries to identify a song snippet, chewing through 10 million records. Last year, the service added hundreds of the then-shipping "Fermi" Tesla GPU coprocessors to its cluster and was able to process 300 million inquiries and index and sort over 27 million records.

Gupta says that by offloading a lot of the work from CPUs to GPU coprocessors Shazam can grow its service to cover more music and do searches faster, too. All while keeping the server footprint down. No word on when Shazam will move to "Kepler" family of Tesla GPU coprocessors, but like other vendors, Shazam has a qualification cycle for adopting new technologies and is working through that process now.

Over at Cortexica Vision Systems, it was not even possible to do the visual recognition that the company is bringing to the shopping experience without some sort of cheap computing as you can get in a coprocessor that is based on some sort of parallel architecture.

The system that Cortexica has cooked up allows shoppers to take a picture of an item they want to comparison shop. Each photo is uploaded into the Cortexica parallel ceepie-geepie, with over 1,000 different points of identification taken from that photo.

Cortexica has built a database of over 1 million apparel items, and can find matches or near matches for the items in a matter of seconds based just on the photos. Cortexica is not a retailer itself, but rather offering its database and search algorithms as a service that shopping sites can embed in their services.

This photo matching technology has obvious applicability in other kinds of applications, some of them potentially of a dubious or nefarious nature. That has always been true of every technology humans have created.

Another use for GPU coprocessors that Huang will be talking about is the reformatting of live video feeds using ceepie-geepie appliances from Elemental.

This was the company that supplied the live encoding of video streams from the Summer Olympics in London last year, and is also used by the Weather Channel to do the same job for its video streams. The latest Elemental boxes have Nvidia's Tesla K10 GPU coprocessors paired one-for-one to an x86 processor in a two-socket rack server.

The Weather Channel serves up video to 38 million viewers on mobile devices each month, and during Superstorm Sandy last October the site handled 12 million concurrent live video streams over the Internet, and each one of those streams was re-encoded on the fly from the live feed from Weather Channel studios to meet the screen size and resolutions of various PCs, smartphones, and tablets. ®

Best practices for enterprise data

More from The Register

next story
Microsoft's Euro cloud darkens: US FEDS can dig into foreign servers
They're not emails, they're business records, says court
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
VMware builds product executables on 50 Mac Minis
And goes to the Genius Bar for support
Multipath TCP speeds up the internet so much that security breaks
Black Hat research says proposed protocol will bork network probes, flummox firewalls
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
prev story


7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?