Feeds

The cost of not deduping

Think pouring budget down drain

  • alert
  • submit to reddit

Build a business case: developing custom apps

In our homes and offices duplicated information is such a fact of life we don’t even think about it. In the digital world though we can think about it, and should, because it can stop a lot of wasteful spending.

Imagine a department of twenty people. They each have their own filed copies of their employment contract and a pensions scheme guide. You can see them filed in twenty desk drawers. More than ninety percent of the information in these documents is duplicated, but we individuals don’t think much about it because we have our own copy of the documents.

Now let's computerise this and give the twenty people electronic copies of their HR contract and pension scheme guide, stored on a departmental server’s disk drive, and consuming 10MB of capacity per person; that’s 200MB. With deduplication technology we can detect that there are multiple copies of 90 per cent the information and eliminate them, replacing them by pointers to a single or master copy of the data. Our 200MB just decreased to 29MB, a roughly 7:1 deduplication ratio.

In the digital world duplication is a money-wasting sin

Deduplication technology can be applied with spectacular results to backup data, where, with repetitive daily and weekly backups, there can be a huge amount of redundant data. There can be terabytes of backup data stored on disk in a reasonably large organisation and effective deduplication ratios for full backups can approach 20:1. That means a 20TB backup store can be reduced to 1TB, and a 200TB one to 10TB.

Assume we are using 1TB disk drives; we don’t need 200 of them for the 200TB store; instead we only need 10. This is the main deduplication saving. You don’t need to buy as much disk capacity. If the backup data storage drive array has RAID protection then you need a smaller amount of capacity for the RAID parity data and copied data.

If you send a copy of your backup data off-site for disaster recovery and business continuity reasons, then you only need a network link capable of transmitting 10TB in a reasonable time rather than a much more expensive 200TB one. And the destination data centre’s storage capacity need for this data is 10TB and not 200TB: another saving.

There are further savings as a result. Your data centre power costs go down because, instead of spinning 200 disk drives you only need to have ten spinning. That means less heat is generated and so your datacentre cooling bill is lowered as well.

With fewer disk drives spinning, the chances of one of them failing is lessened and your data are therefore more available.

The savings from deduplication are multiplied by these effects.

The technology is beginning to be applied to nearline and primary data as well as to backup data, as processing power increases and the technology is improved. Multi-core storage controllers can deduplicate data pretty quickly and newer technologies such as Permabit’s Albireo take deduplication out of the data access path.

As and when data is stored in the cloud - remote data centres at the end of a wide area network link - then deduplication means you only pay for the storage of unique data there, and not multiple copies of a PowerPoint presentation or an image that has been identically attached to twenty emails. All those repeated clauses in the twenty HR employment contracts and pension scheme guides we mentioned earlier can be stripped out leaving just the unique data.

In our homes and offices duplication of paper-stored information is a fact of life and we don’t even think about it. In the digital world such duplication is a money-wasting sin and we should and must think about it, so we can spend our money where it can do good instead of being poured down a drain. ®

The essential guide to IT transformation

More from The Register

next story
Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?
Tip: Put the shades on and you'll look less of a spanner
So, Apple won't sell cheap kit? Prepare the iOS garden wall WRECKING BALL
It can throw the low cost race if it looks to the cloud
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Now that's FIRE WIRE: HP recalls 6 MILLION burn-risk laptop cables
Right in the middle of Burning Mains Man week
One step closer to ROBOT BUTLERS: Dyson flashes vid of VACUUM SUCKER bot
Latest cleaner available for world+dog in September
Apple's iWatch? They cannae do it ... they don't have the POWER
Analyst predicts fanbois will have to wait until next year
HUGE iPAD? Maybe. HUGE ADVERTS? That's for SURE
Noo! Hand not big enough! Don't look at meee!
Samsung Gear S: Quick, LAUNCH IT – before Apple straps on iWatch
Full specs for wrist-mounted device here ... but who'll buy it?
AMD unveils 'single purpose' graphics card for PC gamers and NO ONE else
Chip maker claims the Radeon R9 285 is 'best in its class'
prev story

Whitepapers

Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up distributed data
Eliminating the redundant use of bandwidth and storage capacity and application consolidation in the modern data center.
The essential guide to IT transformation
ServiceNow discusses three IT transformations that can help CIOs automate IT services to transform IT and the enterprise
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.