Feeds

Researchers reveal radical RAID rethink

“Pipelined erasure coding” helps storage to scale at speed

HP ProLiant Gen8: Integrated lifecycle automation

Singaporean researchers have proposed a new way to protect the integrity of data in distributed storage systems and say their “RapidRAID” system offers top protection while consuming fewer network, computing and storage array resources than other approaches.

RAID – redundant arrays of inexpensive disks – has been a storage staple for a almost quarter of a century. The technique involves replicating data across a number of disks so that failure or loss of a single spindle does not result in data loss. When a drive dies, RAID means a new drive can be added to an array and the data from the original drive will be restored. Different “levels” of RAID work with varying quantities of disk and deliver different levels of reliability.

RAID has, of late, become less popular as various scale-out architectures offer different approaches to redundant data storage. The technique is also challenged by multi-terabyte disk drives, as the sheer quantity of data on such disks means rebuilding a drive can take rather longer, and hog more IOPS, than many users are willing to endure.

Erasure codes are one of the techniques challenging RAID and can most easily be understood as a form of metadata. Erasure codes allow fragments of data to be spread across a wider pool of disks, before the desired data is re-assembled using fragments from multiple sources. Erasure codes feature in the Google File System, Hadoop’s file system, Azure and several commercial products.

Some have even described erasure codes as delivering RAIN – a redundant array of inexpensive nodes – that is positioned as a successor to RAID.

The Singaporean researchers’ work, available on arXiv, proposes a new scheme called RapidRAID that goes beyond other implementations of erasure codes, reducing the amount of storage required to create a viable archive while also speeding the time required to create that archive.

The team thinks this is possible with what it calls “pipelined insertion” under which:

“… the encoding process is distributed among those nodes storing replicated data of the object to be encoded, which exploits data locality and saves network traffic. We then arrange the encoding nodes in a pipeline where each node sends some partially encoded data to the next node, which creates parity data simultaneously on different storage nodes, avoiding the extra time required to distribute the parity after the encoding process is terminated.”

The paper linked to above then proposes RapidRAID, a set of erasure codes which, just like RAID, offer different levels of data protection.

Tests of the new codes are described in the paper, which compares RapidRAID to the Reed-Solomon erasure codes used in many current implementations. In a test involving 50 thin clients and 16 EC2 instances, the researchers proclaim RapidRAID superior in some ways.

The researchers therefore declare RapidRAID a viable big data enabler, but conclude that there’s more work to be done before it can be declared suitable for applications that require more than two copies of data.

The codes are available for download on github. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.