Feeds

HPC 2.0: The monster mash-up

When Big Data gets big, data centres should get nervous

Combat fraud and increase customer satisfaction

Blog This is the second of a three-part series on the convergence of HPC and business analytics, and the implications for data centers. The first article is here; you’re reading the second one; and the third story is coming soon.

The genesis of this set of articles was a recent IBM analyst conference during which the company laid out their HPC strategy. Much of the material and ensuing discussion was about the worlds of HPC and business analytics coming together and what this means for citizens of both worlds, particularly when it comes to dealing with the explosive growth of data. Big data is – well – damned big, as it turns out.

IBM’s Dave Turek took us through the process of analyzing large data sets and the challenges it will present. Not surprisingly, there are a lot of factors to take into account when building or adapting an existing infrastructure to support enterprise analytics.

First, it’s important to realize that the most time-consuming task in processing big data is simply moving the data around. This means getting it onto storage arrays where it can be read by systems, processed, and then the output is stored back onto the arrays.

This looms large when you consider that most analytic processes aren’t just a single workload where data flows in and answers flow out; there are steps performed by different applications on separate systems.

Some will say that this is the case for many business applications already, and our fast networking and fast storage arrays work fine – so what’s the big deal? The big deal is big data and the need for speed.

Data sets range from hundreds of terabytes into the petabyte range – and are growing fast. This isn’t data that’s just going to be sorted and used to build reports; this data needs to be analyzed in near real- time in order to guide decision making.

The weak link is bulk transfers from spinning drives, which are limited to about 1Gb/s or 128MB/sec real-world speed, at best, per spindle. Moving 250TB of data will take almost 5.69 hours using 100 drive spindles or about 40 minutes using 1,000 spindles. The time it takes to move this amount of data multiple times from storage to system, then system to storage adds up – even with thousands of spindles working in concert.

One way to get around this problem is to have data directly transferred from one system to another, which will eliminate the multiple loads and saves from disk storage. With this kind of solution, your overall performance will be limited to the speed of your network – which is probably around 1Gb/s (about the same as a single drive) or maybe 10Gb/s. With large datasets, this is still slower than it could and should be.

So what’s the right answer? We’ll talk about that in Part 3 of this series ... ®

Combat fraud and increase customer satisfaction

Whitepapers

Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.