Feeds

HPC 2.0: The monster mash-up

When Big Data gets big, data centres should get nervous

Top 5 reasons to deploy VMware with Tegile

Blog This is the second of a three-part series on the convergence of HPC and business analytics, and the implications for data centers. The first article is here; you’re reading the second one; and the third story is coming soon.

The genesis of this set of articles was a recent IBM analyst conference during which the company laid out their HPC strategy. Much of the material and ensuing discussion was about the worlds of HPC and business analytics coming together and what this means for citizens of both worlds, particularly when it comes to dealing with the explosive growth of data. Big data is – well – damned big, as it turns out.

IBM’s Dave Turek took us through the process of analyzing large data sets and the challenges it will present. Not surprisingly, there are a lot of factors to take into account when building or adapting an existing infrastructure to support enterprise analytics.

First, it’s important to realize that the most time-consuming task in processing big data is simply moving the data around. This means getting it onto storage arrays where it can be read by systems, processed, and then the output is stored back onto the arrays.

This looms large when you consider that most analytic processes aren’t just a single workload where data flows in and answers flow out; there are steps performed by different applications on separate systems.

Some will say that this is the case for many business applications already, and our fast networking and fast storage arrays work fine – so what’s the big deal? The big deal is big data and the need for speed.

Data sets range from hundreds of terabytes into the petabyte range – and are growing fast. This isn’t data that’s just going to be sorted and used to build reports; this data needs to be analyzed in near real- time in order to guide decision making.

The weak link is bulk transfers from spinning drives, which are limited to about 1Gb/s or 128MB/sec real-world speed, at best, per spindle. Moving 250TB of data will take almost 5.69 hours using 100 drive spindles or about 40 minutes using 1,000 spindles. The time it takes to move this amount of data multiple times from storage to system, then system to storage adds up – even with thousands of spindles working in concert.

One way to get around this problem is to have data directly transferred from one system to another, which will eliminate the multiple loads and saves from disk storage. With this kind of solution, your overall performance will be limited to the speed of your network – which is probably around 1Gb/s (about the same as a single drive) or maybe 10Gb/s. With large datasets, this is still slower than it could and should be.

So what’s the right answer? We’ll talk about that in Part 3 of this series ... ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
BIG FAT Lies: Porky Pies about obesity
What really shortens lives? Reading this sort of crap in the papers
Be real, Apple: In-app goodie grab games AREN'T FREE – EU
Cupertino stands down after Euro legal threats
Assange™ slumps back on Ecuador's sofa after detention appeal binned
Swedish court rules there's 'great risk' WikiLeaker will dodge prosecution
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
The Heartbleed Bug: how to protect your business with Symantec
What happens when the next Heartbleed (or worse) comes along, and what can you do to weather another chapter in an all-too-familiar string of debilitating attacks?