Feeds

Go forth and deduplicate

Will it benefit my data centre?

  • alert
  • submit to reddit

Beginner's guide to SSL certificates

Deduplication use cases

There are many types of data that can benefit from this impressive capacity-reduction potential, including backups, where each stream of backup data is very similar to the last backup, with only small percentage of data changing between each backup. Backups can show deduplication ratios of 20 to one, and are normally much greater. Virtual-machine images, where each image is largely similar to every other, also deduplicate well, with savings of 90 per cent or more in practice.

Deduplication can be used for backup, primary storage, WAN optimisation, archiving, and disaster recovery. In fact, any point where data is stored and transmitted is a candidate.

Points to consider

Deduplication looks like a winner – but, like all technologies, getting the best from it requires understanding where it works well and where it isn't effective as well as the flavours offered by the various vendors.

Not all data types deduplicate as well as others. Some are problematic, such as video streams or geophysical data, for example. Many of these types of data have little to no repetitive data, and may already be compressed. On the other hand, regardless of their data type, backups – which contain large amounts of data that doesn't change on a regular basis – deduplicate well.

But generally most data types and sources of data have some potential for deduplication – home directories and VM images, for example. Deduplicated data may also be slower to access because reconstituting the data (sometimes referred to as "rehydration") may require more processing resources on the storage system than a file that's not been deduplicated, typically in the form of more CPU cycles.

On the other hand, deduped data may be faster to access since less data movement from slow disks is involved. Caching at the storage controller on flash storage devices or in the network itself can considerably reduce the overall I/O load on the disk subsystem. But your mileage may vary, and evaluation of the benefits needs an understanding of the service you are delivering and the data you are managing.

Most data types will benefit from deduplication, as the overheads are small and outweighed by the significant savings, but high-performance applications that require very fast access to their data are not generally good candidates for deduplication.

The bottom line

Data deduplication helps by managing data growth, reducing network bandwidth requirements, and therefore improves capacity and performance efficiencies. Significant cost reductions can be made, from lower administration costs (there's less to manage) to space, power, and cooling outgoings – deduplication helps data centres become greener by reducing the carbon footprint per stored byte.

When evaluating deduplication the answer to the question "Will it benefit my data centre?" generally is: "It will." The success of deduplication technologies to date should encourage every storage administrator to "go forth and deduplicate". ®

This article was written by Alex McDonald, SNIA Europe UK country committee member, NetApp, based on an existing SNIA material. To explore deduplication further, check out this SNIA tutorial: Advanced deduplication concepts (pdf).

To view all of the SNIA tutorials on Data Protection and Management, visit the SNIA Europe website at www.snia-europe.org/en/technology-topics/snia-tutorials/data-protection-and-management.cfm.

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
AWS pulls desktop-as-a-service from the PC
Support for PCoIP protocol means zero clients can run cloudy desktops
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.