Feeds

HPC 2.0: The monster mash-up

When Big Data gets big, data centres should get nervous

Internet Security Threat Report 2014

Blog This is the second of a three-part series on the convergence of HPC and business analytics, and the implications for data centers. The first article is here; you’re reading the second one; and the third story is coming soon.

The genesis of this set of articles was a recent IBM analyst conference during which the company laid out their HPC strategy. Much of the material and ensuing discussion was about the worlds of HPC and business analytics coming together and what this means for citizens of both worlds, particularly when it comes to dealing with the explosive growth of data. Big data is – well – damned big, as it turns out.

IBM’s Dave Turek took us through the process of analyzing large data sets and the challenges it will present. Not surprisingly, there are a lot of factors to take into account when building or adapting an existing infrastructure to support enterprise analytics.

First, it’s important to realize that the most time-consuming task in processing big data is simply moving the data around. This means getting it onto storage arrays where it can be read by systems, processed, and then the output is stored back onto the arrays.

This looms large when you consider that most analytic processes aren’t just a single workload where data flows in and answers flow out; there are steps performed by different applications on separate systems.

Some will say that this is the case for many business applications already, and our fast networking and fast storage arrays work fine – so what’s the big deal? The big deal is big data and the need for speed.

Data sets range from hundreds of terabytes into the petabyte range – and are growing fast. This isn’t data that’s just going to be sorted and used to build reports; this data needs to be analyzed in near real- time in order to guide decision making.

The weak link is bulk transfers from spinning drives, which are limited to about 1Gb/s or 128MB/sec real-world speed, at best, per spindle. Moving 250TB of data will take almost 5.69 hours using 100 drive spindles or about 40 minutes using 1,000 spindles. The time it takes to move this amount of data multiple times from storage to system, then system to storage adds up – even with thousands of spindles working in concert.

One way to get around this problem is to have data directly transferred from one system to another, which will eliminate the multiple loads and saves from disk storage. With this kind of solution, your overall performance will be limited to the speed of your network – which is probably around 1Gb/s (about the same as a single drive) or maybe 10Gb/s. With large datasets, this is still slower than it could and should be.

So what’s the right answer? We’ll talk about that in Part 3 of this series ... ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
WHY did Sunday Mirror stoop to slurping selfies for smut sting?
Tabloid splashes, MP resigns - but there's a BIG copyright issue here
Spies, avert eyes! Tim Berners-Lee demands a UK digital bill of rights
Lobbies tetchy MPs 'to end indiscriminate online surveillance'
How the FLAC do I tell MP3s from lossless audio?
Can you hear the difference? Can anyone?
Google hits back at 'Dear Rupert' over search dominance claims
Choc Factory sniffs: 'We're not pirate-lovers - also, you publish The Sun'
Inequality increasing? BOLLOCKS! You heard me: 'Screw the 1%'
There's morality and then there's economics ...
While you queued for an iPhone 6, Apple's Cook sold shares worth $35m
Right before the stock took a 3.8% dive amid bent and broken mobe drama
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.