Feeds

Go forth and deduplicate

Will it benefit my data centre?

  • alert
  • submit to reddit

The Essential Guide to IT Transformation

Deduplication use cases

There are many types of data that can benefit from this impressive capacity-reduction potential, including backups, where each stream of backup data is very similar to the last backup, with only small percentage of data changing between each backup. Backups can show deduplication ratios of 20 to one, and are normally much greater. Virtual-machine images, where each image is largely similar to every other, also deduplicate well, with savings of 90 per cent or more in practice.

Deduplication can be used for backup, primary storage, WAN optimisation, archiving, and disaster recovery. In fact, any point where data is stored and transmitted is a candidate.

Points to consider

Deduplication looks like a winner – but, like all technologies, getting the best from it requires understanding where it works well and where it isn't effective as well as the flavours offered by the various vendors.

Not all data types deduplicate as well as others. Some are problematic, such as video streams or geophysical data, for example. Many of these types of data have little to no repetitive data, and may already be compressed. On the other hand, regardless of their data type, backups – which contain large amounts of data that doesn't change on a regular basis – deduplicate well.

But generally most data types and sources of data have some potential for deduplication – home directories and VM images, for example. Deduplicated data may also be slower to access because reconstituting the data (sometimes referred to as "rehydration") may require more processing resources on the storage system than a file that's not been deduplicated, typically in the form of more CPU cycles.

On the other hand, deduped data may be faster to access since less data movement from slow disks is involved. Caching at the storage controller on flash storage devices or in the network itself can considerably reduce the overall I/O load on the disk subsystem. But your mileage may vary, and evaluation of the benefits needs an understanding of the service you are delivering and the data you are managing.

Most data types will benefit from deduplication, as the overheads are small and outweighed by the significant savings, but high-performance applications that require very fast access to their data are not generally good candidates for deduplication.

The bottom line

Data deduplication helps by managing data growth, reducing network bandwidth requirements, and therefore improves capacity and performance efficiencies. Significant cost reductions can be made, from lower administration costs (there's less to manage) to space, power, and cooling outgoings – deduplication helps data centres become greener by reducing the carbon footprint per stored byte.

When evaluating deduplication the answer to the question "Will it benefit my data centre?" generally is: "It will." The success of deduplication technologies to date should encourage every storage administrator to "go forth and deduplicate". ®

This article was written by Alex McDonald, SNIA Europe UK country committee member, NetApp, based on an existing SNIA material. To explore deduplication further, check out this SNIA tutorial: Advanced deduplication concepts (pdf).

To view all of the SNIA tutorials on Data Protection and Management, visit the SNIA Europe website at www.snia-europe.org/en/technology-topics/snia-tutorials/data-protection-and-management.cfm.

The Essential Guide to IT Transformation

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.