Feeds

Go forth and deduplicate

Will it benefit my data centre?

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

Deduplication use cases

There are many types of data that can benefit from this impressive capacity-reduction potential, including backups, where each stream of backup data is very similar to the last backup, with only small percentage of data changing between each backup. Backups can show deduplication ratios of 20 to one, and are normally much greater. Virtual-machine images, where each image is largely similar to every other, also deduplicate well, with savings of 90 per cent or more in practice.

Deduplication can be used for backup, primary storage, WAN optimisation, archiving, and disaster recovery. In fact, any point where data is stored and transmitted is a candidate.

Points to consider

Deduplication looks like a winner – but, like all technologies, getting the best from it requires understanding where it works well and where it isn't effective as well as the flavours offered by the various vendors.

Not all data types deduplicate as well as others. Some are problematic, such as video streams or geophysical data, for example. Many of these types of data have little to no repetitive data, and may already be compressed. On the other hand, regardless of their data type, backups – which contain large amounts of data that doesn't change on a regular basis – deduplicate well.

But generally most data types and sources of data have some potential for deduplication – home directories and VM images, for example. Deduplicated data may also be slower to access because reconstituting the data (sometimes referred to as "rehydration") may require more processing resources on the storage system than a file that's not been deduplicated, typically in the form of more CPU cycles.

On the other hand, deduped data may be faster to access since less data movement from slow disks is involved. Caching at the storage controller on flash storage devices or in the network itself can considerably reduce the overall I/O load on the disk subsystem. But your mileage may vary, and evaluation of the benefits needs an understanding of the service you are delivering and the data you are managing.

Most data types will benefit from deduplication, as the overheads are small and outweighed by the significant savings, but high-performance applications that require very fast access to their data are not generally good candidates for deduplication.

The bottom line

Data deduplication helps by managing data growth, reducing network bandwidth requirements, and therefore improves capacity and performance efficiencies. Significant cost reductions can be made, from lower administration costs (there's less to manage) to space, power, and cooling outgoings – deduplication helps data centres become greener by reducing the carbon footprint per stored byte.

When evaluating deduplication the answer to the question "Will it benefit my data centre?" generally is: "It will." The success of deduplication technologies to date should encourage every storage administrator to "go forth and deduplicate". ®

This article was written by Alex McDonald, SNIA Europe UK country committee member, NetApp, based on an existing SNIA material. To explore deduplication further, check out this SNIA tutorial: Advanced deduplication concepts (pdf).

To view all of the SNIA tutorials on Data Protection and Management, visit the SNIA Europe website at www.snia-europe.org/en/technology-topics/snia-tutorials/data-protection-and-management.cfm.

Secure remote control for conventional and virtual desktops

More from The Register

next story
Linux? Bah! Red Hat has its eye on the CLOUD – and it wants to own it
CEO says it will be 'undisputed leader' in enterprise cloud tech
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Hey, what's a STORAGE company doing working on Internet-of-Cars?
Boo - it's not a terabyte car, it's just predictive maintenance and that
Troll hunter Rackspace turns Rotatable's bizarro patent to stone
News of the Weird: Screen-rotating technology declared unpatentable
prev story

Whitepapers

A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.