Nvidia stretches Tesla GPU coprocessors from HPC to big data

'Anything a CPU can do, a GPU can do better'

High performance access to file storage

GTC 2013 Graphics chip maker Nvidia has barely begun to put a dent in the traditional high performance computing segment with its Tesla GPU coprocessors and it is already gearing up to take on new markets. The next target is big data, and as with parallel supercomputing, Nvidia is hoping to get the jump on rivals Intel and AMD, which peddle their respective x86 and GPU coprocessors.

As it turns out, a GPU is not just good at doing floating point math in single precision or double precision, but it can also be used to sift through streams of data to index it and sort it. It takes a bit of work to repurpose these jobs for a GPU, which have been run on CPUs for the most part, but Nvidia is seeing more and more companies using its Tesla GPUs to augment the indexing, sorting, and otherwise chewing of large data sets.

This will be on of the themes that Jen-Hsun Huang, co-founder and CEO at Nvidia, covered during his opening keynote at the GPU Technology Conference in San Jose on Tuesday. Sumit Gupta, general manager of the Tesla Accelerated Computing business unit at Nvidia, gave El Reg some examples of the kind of adjacent big data jobs where GPUs are being deployed.

The first example comes from SaaSy CRM software vendor Salesforce.com. As it turns out, Salesforce.com is one of the six companies in the world that has full access to the Twitter firehose. The number of tweets per day was miniscule back in 2007 through 2009, but in 2010 it jumped to about 50 million tweets per day, busted through 200 million per day in 2011, and broke through 500 million last year.

If you want to chew on all that data to do sentiment analysis for CRM customers, as Salesforce.com most certainly wants to do, then you would need at least ten times the iron you needed three years ago to get the job done.

Just like sticking with regular server processors is not practical in terms of cost, bang for the buck, or performance per watt for a lot of traditional HPC shops, Salesforce.com found the same issues in the big data munchers it had to store and index the raw Twitter feed.

For one thing, so-called real-time text searches on its CPU clusters holding the Twitter data were taking up to ten minutes to complete, which is better than running in batch mode, but cannot be considered real time.

So Salesforce.com engineers figured out how to index the raw Twitter feed using GPUs to accelerate the CPUs, and then also uses GPU offload to match a search term against that index to create baby Twitter feeds that can in turn be pumped over to customers and meshed with their CRM apps. With the GPU assist, Salesforce.com has been able to reduce its Twitter text search down to one second or less, and that is basically real-time.

Salesforce.com is not, however, talking about the infrastructure that makes up its Twitter search engine. Marc Benioff would have to kill you if you found out. The odds favor some kind of NoSQL data store running on an x86 cluster, of course.

The Shazam music identification service has also shifted to a ceepie-geepie architecture to sort through the digital fingerprints that it creates to identify songs from any snippet anywhere in the song. The move to GPUs was precipitated by a trebling of searches and records in the past year.

In 2011, the company supported 100 million user inquiries to identify a song snippet, chewing through 10 million records. Last year, the service added hundreds of the then-shipping "Fermi" Tesla GPU coprocessors to its cluster and was able to process 300 million inquiries and index and sort over 27 million records.

Gupta says that by offloading a lot of the work from CPUs to GPU coprocessors Shazam can grow its service to cover more music and do searches faster, too. All while keeping the server footprint down. No word on when Shazam will move to "Kepler" family of Tesla GPU coprocessors, but like other vendors, Shazam has a qualification cycle for adopting new technologies and is working through that process now.

Over at Cortexica Vision Systems, it was not even possible to do the visual recognition that the company is bringing to the shopping experience without some sort of cheap computing as you can get in a coprocessor that is based on some sort of parallel architecture.

The system that Cortexica has cooked up allows shoppers to take a picture of an item they want to comparison shop. Each photo is uploaded into the Cortexica parallel ceepie-geepie, with over 1,000 different points of identification taken from that photo.

Cortexica has built a database of over 1 million apparel items, and can find matches or near matches for the items in a matter of seconds based just on the photos. Cortexica is not a retailer itself, but rather offering its database and search algorithms as a service that shopping sites can embed in their services.

This photo matching technology has obvious applicability in other kinds of applications, some of them potentially of a dubious or nefarious nature. That has always been true of every technology humans have created.

Another use for GPU coprocessors that Huang will be talking about is the reformatting of live video feeds using ceepie-geepie appliances from Elemental.

This was the company that supplied the live encoding of video streams from the Summer Olympics in London last year, and is also used by the Weather Channel to do the same job for its video streams. The latest Elemental boxes have Nvidia's Tesla K10 GPU coprocessors paired one-for-one to an x86 processor in a two-socket rack server.

The Weather Channel serves up video to 38 million viewers on mobile devices each month, and during Superstorm Sandy last October the site handled 12 million concurrent live video streams over the Internet, and each one of those streams was re-encoded on the fly from the live feed from Weather Channel studios to meet the screen size and resolutions of various PCs, smartphones, and tablets. ®

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
USA opposes 'Schengen cloud' Eurocentric routing plan
All routes should transit America, apparently
prev story


Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.