Feeds

The cost of not deduping

Think pouring budget down drain

  • alert
  • submit to reddit

Intelligent flash storage arrays

In our homes and offices duplicated information is such a fact of life we don’t even think about it. In the digital world though we can think about it, and should, because it can stop a lot of wasteful spending.

Imagine a department of twenty people. They each have their own filed copies of their employment contract and a pensions scheme guide. You can see them filed in twenty desk drawers. More than ninety percent of the information in these documents is duplicated, but we individuals don’t think much about it because we have our own copy of the documents.

Now let's computerise this and give the twenty people electronic copies of their HR contract and pension scheme guide, stored on a departmental server’s disk drive, and consuming 10MB of capacity per person; that’s 200MB. With deduplication technology we can detect that there are multiple copies of 90 per cent the information and eliminate them, replacing them by pointers to a single or master copy of the data. Our 200MB just decreased to 29MB, a roughly 7:1 deduplication ratio.

In the digital world duplication is a money-wasting sin

Deduplication technology can be applied with spectacular results to backup data, where, with repetitive daily and weekly backups, there can be a huge amount of redundant data. There can be terabytes of backup data stored on disk in a reasonably large organisation and effective deduplication ratios for full backups can approach 20:1. That means a 20TB backup store can be reduced to 1TB, and a 200TB one to 10TB.

Assume we are using 1TB disk drives; we don’t need 200 of them for the 200TB store; instead we only need 10. This is the main deduplication saving. You don’t need to buy as much disk capacity. If the backup data storage drive array has RAID protection then you need a smaller amount of capacity for the RAID parity data and copied data.

If you send a copy of your backup data off-site for disaster recovery and business continuity reasons, then you only need a network link capable of transmitting 10TB in a reasonable time rather than a much more expensive 200TB one. And the destination data centre’s storage capacity need for this data is 10TB and not 200TB: another saving.

There are further savings as a result. Your data centre power costs go down because, instead of spinning 200 disk drives you only need to have ten spinning. That means less heat is generated and so your datacentre cooling bill is lowered as well.

With fewer disk drives spinning, the chances of one of them failing is lessened and your data are therefore more available.

The savings from deduplication are multiplied by these effects.

The technology is beginning to be applied to nearline and primary data as well as to backup data, as processing power increases and the technology is improved. Multi-core storage controllers can deduplicate data pretty quickly and newer technologies such as Permabit’s Albireo take deduplication out of the data access path.

As and when data is stored in the cloud - remote data centres at the end of a wide area network link - then deduplication means you only pay for the storage of unique data there, and not multiple copies of a PowerPoint presentation or an image that has been identically attached to twenty emails. All those repeated clauses in the twenty HR employment contracts and pension scheme guides we mentioned earlier can be stripped out leaving just the unique data.

In our homes and offices duplication of paper-stored information is a fact of life and we don’t even think about it. In the digital world such duplication is a money-wasting sin and we should and must think about it, so we can spend our money where it can do good instead of being poured down a drain. ®

Internet Security Threat Report 2014

More from The Register

next story
Fujitsu CTO: We'll be 3D-printing tech execs in 15 years
Fleshy techie disses network neutrality, helmet-less motorcyclists
Space Commanders rebel as Elite:Dangerous kills offline mode
Frontier cops an epic kicking in its own forums ahead of December revival
Intel's LAME DUCK mobile chips gobbled by CASH COW
Chipzilla won't have money-losing mobe unit to kick about anymore
First in line to order a Nexus 6? AT&T has a BRICK for you
Black Screen of Death plagues early Google-mobe batch
Ford's B-Max: Fiesta-based runaround that goes THUNK
... when you close the slidey doors, that is ...
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.