HPC 2.0: The Monster Mash-up
Big Data. Oh yes
Magic Quadrant for Enterprise Backup/Recovery
Pt 1. Blog IBM recently invited a handful of really smart HPC-centric industry analysts (and me too, for no apparent reason) to spend the day talking about where the market is going and how IBM intends to address it.
It was truly a conversation, rather than the typical vendor PowerPoint-palooza where they simply run through every product slide deck they can get their hands on.
One of the major threads running thorough the various presentations and conversations is the convergence of “traditional” HPC/supercomputing, analytics, and mainstream computing. If you’re reading the industry press, you’ll see this trend referred to as ‘Big Data’, ‘Business Intelligence’, or "Predictive Analytics".
These terms are bandied about as if they’re interchangeable: they aren’t. And as if all mean the same thing: they don't. I’m not innocent of sowing name confusion; I’ve been using the term 'HPC 2.0' to describe the increasing use of HPC-like methods and infrastructure in non-HPC organizations.
Whatever you call it, it’s happening, and it will impose ever increasing demands on the traditional business data center. The amount of data that organizations will gather and try to analyze is mind-boggling.
In addition to a greater flow of data generated organically by the organization, the best companies are casting their nets wide in an attempt to bring in even more data from outside sources (social network mentions are one example). Also sensor technology is now very cheap and will be increasingly deployed to monitor or track, well, pretty much anything.
When Big Data gets big, data centers should get nervous
This all adds up to an increasingly large volume of data that needs to be sorted, stored, and, yes, analyzed. At a high level, there is somewhere close to a zettabyte (which is 1,000 exabytes or a 1.07 billion terabytes) in digital data floating around today. More than 15 petabytes of new data is created daily – data that will in some way be analyzed by someone to figure out whether it presents an opportunity or a threat.
So data storage will be a challenge, sure, but storage has never been less expensive, and we can always get more. But the challenges get more challenge-y when you realize that this isn’t just archival data that can be stored away and forgotten.
This data, no matter how obscure or routine, could very well be used as grist for the enterprise analytics mill. The data center will have people on the business side of the organization asking, nay demanding, instant access to data that heretofore was filed away and forgotten.
They’ll also ask for, nay demand, systems that can quickly crunch through reams of data and deliver answers to complex questions in real or near-real time.
In my next installment, I will discuss what these analytic workloads look like, how they act, and how to best architect an infrastructure to handle them. ®
COMMENTS
Big Data has a variety of different meanings. It can be a large data warehouse that you run data mining and analytics on. It could be an extremely large corporate database for on line transaction processing. It could be large unstructured or file system data that is managed through database meta data. Regardless, everyone's data repositories are growing exponentially to capture knowledge (lessons learned, etc) and accommodate ever emerging data acquisition technologies. The uses of data are also expanding rapidly for making key business decisions based on multidisciplinary data.
In my experience, if you want the most optimal application performance, you need to make sure the data operands are either in a core's register file or on chip cache. HPC to me means optimizing data movement using multiple level buffering from disk to SSD to memory to L3 cache to L2 cache to L1 cache so the processor cores do not stall waiting for data. Big Data means more effective data management, in-memory grids, large fast memory, and optimizing data access, movement, integration, processing, and real time delivery of results.
My 2 cents
it certainly is a storage challenge....
Just storing the stuff on disks seems easy doesn't it... But look at what's happening to bandwidth verse capacity growth - it cannot keep up. It means that you really will see the compute move to where the data is being generated, Perhaps filesystems and storage arrays need to get a lot smarter than they are today.
bollocks
HPC isn't converging with anything, its almost completely static.

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring