Original URL: http://www.theregister.co.uk/2011/09/27/big_data_hpc_2_0/

HPC 2.0: The Monster Mash-up

Big Data. Oh yes

By Dan Olds, Gabriel Consulting

Posted in Data Warehousing, 27th September 2011 20:53 GMT

Pt 1. Blog IBM recently invited a handful of really smart HPC-centric industry analysts (and me too, for no apparent reason) to spend the day talking about where the market is going and how IBM intends to address it.

It was truly a conversation, rather than the typical vendor PowerPoint-palooza where they simply run through every product slide deck they can get their hands on.

One of the major threads running thorough the various presentations and conversations is the convergence of “traditional” HPC/supercomputing, analytics, and mainstream computing. If you’re reading the industry press, you’ll see this trend referred to as ‘Big Data’, ‘Business Intelligence’, or "Predictive Analytics".

These terms are bandied about as if they’re interchangeable: they aren’t. And as if all mean the same thing: they don't. I’m not innocent of sowing name confusion; I’ve been using the term 'HPC 2.0' to describe the increasing use of HPC-like methods and infrastructure in non-HPC organizations.

Whatever you call it, it’s happening, and it will impose ever increasing demands on the traditional business data center. The amount of data that organizations will gather and try to analyze is mind-boggling.

In addition to a greater flow of data generated organically by the organization, the best companies are casting their nets wide in an attempt to bring in even more data from outside sources (social network mentions are one example). Also sensor technology is now very cheap and will be increasingly deployed to monitor or track, well, pretty much anything.

When Big Data gets big, data centers should get nervous

This all adds up to an increasingly large volume of data that needs to be sorted, stored, and, yes, analyzed. At a high level, there is somewhere close to a zettabyte (which is 1,000 exabytes or a 1.07 billion terabytes) in digital data floating around today. More than 15 petabytes of new data is created daily – data that will in some way be analyzed by someone to figure out whether it presents an opportunity or a threat.

So data storage will be a challenge, sure, but storage has never been less expensive, and we can always get more. But the challenges get more challenge-y when you realize that this isn’t just archival data that can be stored away and forgotten.

This data, no matter how obscure or routine, could very well be used as grist for the enterprise analytics mill. The data center will have people on the business side of the organization asking, nay demanding, instant access to data that heretofore was filed away and forgotten.

They’ll also ask for, nay demand, systems that can quickly crunch through reams of data and deliver answers to complex questions in real or near-real time.

In my next installment, I will discuss what these analytic workloads look like, how they act, and how to best architect an infrastructure to handle them. ®