Feeds

HPC 2.0: The monster mash-up

When Big Data gets big, data centres should get nervous

Internet Security Threat Report 2014

Blog This is the second of a three-part series on the convergence of HPC and business analytics, and the implications for data centers. The first article is here; you’re reading the second one; and the third story is coming soon.

The genesis of this set of articles was a recent IBM analyst conference during which the company laid out their HPC strategy. Much of the material and ensuing discussion was about the worlds of HPC and business analytics coming together and what this means for citizens of both worlds, particularly when it comes to dealing with the explosive growth of data. Big data is – well – damned big, as it turns out.

IBM’s Dave Turek took us through the process of analyzing large data sets and the challenges it will present. Not surprisingly, there are a lot of factors to take into account when building or adapting an existing infrastructure to support enterprise analytics.

First, it’s important to realize that the most time-consuming task in processing big data is simply moving the data around. This means getting it onto storage arrays where it can be read by systems, processed, and then the output is stored back onto the arrays.

This looms large when you consider that most analytic processes aren’t just a single workload where data flows in and answers flow out; there are steps performed by different applications on separate systems.

Some will say that this is the case for many business applications already, and our fast networking and fast storage arrays work fine – so what’s the big deal? The big deal is big data and the need for speed.

Data sets range from hundreds of terabytes into the petabyte range – and are growing fast. This isn’t data that’s just going to be sorted and used to build reports; this data needs to be analyzed in near real- time in order to guide decision making.

The weak link is bulk transfers from spinning drives, which are limited to about 1Gb/s or 128MB/sec real-world speed, at best, per spindle. Moving 250TB of data will take almost 5.69 hours using 100 drive spindles or about 40 minutes using 1,000 spindles. The time it takes to move this amount of data multiple times from storage to system, then system to storage adds up – even with thousands of spindles working in concert.

One way to get around this problem is to have data directly transferred from one system to another, which will eliminate the multiple loads and saves from disk storage. With this kind of solution, your overall performance will be limited to the speed of your network – which is probably around 1Gb/s (about the same as a single drive) or maybe 10Gb/s. With large datasets, this is still slower than it could and should be.

So what’s the right answer? We’ll talk about that in Part 3 of this series ... ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Facebook pays INFINITELY MORE UK corp tax than in 2012
Thanks for the £3k, Zuck. Doh! you're IN CREDIT. Guess not
Facebook, Apple: LADIES! Why not FREEZE your EGGS? It's on the company!
No biological clockwatching when you work in Silicon Valley
Happiness economics is bollocks. Oh, UK.gov just adopted it? Er ...
Opportunity doesn't knock; it costs us instead
Sysadmin with EBOLA? Gartner's issued advice to debug your biz
Start hoarding cleaning supplies, analyst firm says, and assume your team will scatter
YARR! Pirates walk the plank: DMCA magnets sink in Google results
Spaffing copyrighted stuff over the web? No search ranking for you
Don't bother telling people if you lose their data, say Euro bods
You read that right – with the proviso that it's encrypted
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.