Feeds

The cost of not deduping

Think pouring budget down drain

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

In our homes and offices duplicated information is such a fact of life we don’t even think about it. In the digital world though we can think about it, and should, because it can stop a lot of wasteful spending.

Imagine a department of twenty people. They each have their own filed copies of their employment contract and a pensions scheme guide. You can see them filed in twenty desk drawers. More than ninety percent of the information in these documents is duplicated, but we individuals don’t think much about it because we have our own copy of the documents.

Now let's computerise this and give the twenty people electronic copies of their HR contract and pension scheme guide, stored on a departmental server’s disk drive, and consuming 10MB of capacity per person; that’s 200MB. With deduplication technology we can detect that there are multiple copies of 90 per cent the information and eliminate them, replacing them by pointers to a single or master copy of the data. Our 200MB just decreased to 29MB, a roughly 7:1 deduplication ratio.

In the digital world duplication is a money-wasting sin

Deduplication technology can be applied with spectacular results to backup data, where, with repetitive daily and weekly backups, there can be a huge amount of redundant data. There can be terabytes of backup data stored on disk in a reasonably large organisation and effective deduplication ratios for full backups can approach 20:1. That means a 20TB backup store can be reduced to 1TB, and a 200TB one to 10TB.

Assume we are using 1TB disk drives; we don’t need 200 of them for the 200TB store; instead we only need 10. This is the main deduplication saving. You don’t need to buy as much disk capacity. If the backup data storage drive array has RAID protection then you need a smaller amount of capacity for the RAID parity data and copied data.

If you send a copy of your backup data off-site for disaster recovery and business continuity reasons, then you only need a network link capable of transmitting 10TB in a reasonable time rather than a much more expensive 200TB one. And the destination data centre’s storage capacity need for this data is 10TB and not 200TB: another saving.

There are further savings as a result. Your data centre power costs go down because, instead of spinning 200 disk drives you only need to have ten spinning. That means less heat is generated and so your datacentre cooling bill is lowered as well.

With fewer disk drives spinning, the chances of one of them failing is lessened and your data are therefore more available.

The savings from deduplication are multiplied by these effects.

The technology is beginning to be applied to nearline and primary data as well as to backup data, as processing power increases and the technology is improved. Multi-core storage controllers can deduplicate data pretty quickly and newer technologies such as Permabit’s Albireo take deduplication out of the data access path.

As and when data is stored in the cloud - remote data centres at the end of a wide area network link - then deduplication means you only pay for the storage of unique data there, and not multiple copies of a PowerPoint presentation or an image that has been identically attached to twenty emails. All those repeated clauses in the twenty HR employment contracts and pension scheme guides we mentioned earlier can be stripped out leaving just the unique data.

In our homes and offices duplication of paper-stored information is a fact of life and we don’t even think about it. In the digital world such duplication is a money-wasting sin and we should and must think about it, so we can spend our money where it can do good instead of being poured down a drain. ®

Beginner's guide to SSL certificates

More from The Register

next story
Xperia Z3: Crikey, Sony – ANOTHER flagship phondleslab?
The Fourth Amendment... and it IS better
Don't wait for that big iPad, order a NEXUS 9 instead, industry little bird says
Google said to debut next big slab, Android L ahead of Apple event
Microsoft to enter the STRUGGLE of the HUMAN WRIST
It's not just a thumb war, it's total digit war
Ex-US Navy fighter pilot MIT prof: Drones beat humans - I should know
'Missy' Cummings on UAVs, smartcars and dying from boredom
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
A drone of one's own: Reg buyers' guide for UAV fanciers
Hardware: Check. Software: Huh? Licence: Licence...?
The Apple launch AS IT HAPPENED: Totally SERIOUS coverage, not for haters
Fandroids, Windows Phone fringe-oids – you wouldn't understand
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.