Feeds

The cost of not deduping

Think pouring budget down drain

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

In our homes and offices duplicated information is such a fact of life we don’t even think about it. In the digital world though we can think about it, and should, because it can stop a lot of wasteful spending.

Imagine a department of twenty people. They each have their own filed copies of their employment contract and a pensions scheme guide. You can see them filed in twenty desk drawers. More than ninety percent of the information in these documents is duplicated, but we individuals don’t think much about it because we have our own copy of the documents.

Now let's computerise this and give the twenty people electronic copies of their HR contract and pension scheme guide, stored on a departmental server’s disk drive, and consuming 10MB of capacity per person; that’s 200MB. With deduplication technology we can detect that there are multiple copies of 90 per cent the information and eliminate them, replacing them by pointers to a single or master copy of the data. Our 200MB just decreased to 29MB, a roughly 7:1 deduplication ratio.

In the digital world duplication is a money-wasting sin

Deduplication technology can be applied with spectacular results to backup data, where, with repetitive daily and weekly backups, there can be a huge amount of redundant data. There can be terabytes of backup data stored on disk in a reasonably large organisation and effective deduplication ratios for full backups can approach 20:1. That means a 20TB backup store can be reduced to 1TB, and a 200TB one to 10TB.

Assume we are using 1TB disk drives; we don’t need 200 of them for the 200TB store; instead we only need 10. This is the main deduplication saving. You don’t need to buy as much disk capacity. If the backup data storage drive array has RAID protection then you need a smaller amount of capacity for the RAID parity data and copied data.

If you send a copy of your backup data off-site for disaster recovery and business continuity reasons, then you only need a network link capable of transmitting 10TB in a reasonable time rather than a much more expensive 200TB one. And the destination data centre’s storage capacity need for this data is 10TB and not 200TB: another saving.

There are further savings as a result. Your data centre power costs go down because, instead of spinning 200 disk drives you only need to have ten spinning. That means less heat is generated and so your datacentre cooling bill is lowered as well.

With fewer disk drives spinning, the chances of one of them failing is lessened and your data are therefore more available.

The savings from deduplication are multiplied by these effects.

The technology is beginning to be applied to nearline and primary data as well as to backup data, as processing power increases and the technology is improved. Multi-core storage controllers can deduplicate data pretty quickly and newer technologies such as Permabit’s Albireo take deduplication out of the data access path.

As and when data is stored in the cloud - remote data centres at the end of a wide area network link - then deduplication means you only pay for the storage of unique data there, and not multiple copies of a PowerPoint presentation or an image that has been identically attached to twenty emails. All those repeated clauses in the twenty HR employment contracts and pension scheme guides we mentioned earlier can be stripped out leaving just the unique data.

In our homes and offices duplication of paper-stored information is a fact of life and we don’t even think about it. In the digital world such duplication is a money-wasting sin and we should and must think about it, so we can spend our money where it can do good instead of being poured down a drain. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
4K-ing excellent TV is on its way ... in its own sweet time, natch
For decades Hollywood actually binned its 4K files. Doh!
Oi, Tim Cook. Apple Watch. I DARE you to tell me, IN PERSON, that it's secure
State attorney demands Apple CEO bows the knee to him
Apple's big bang: iPhone 6, ANOTHER iPhone 6 Plus and WATCH OUT
Let's >sigh< see what Cupertino has been up to for the past year
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Get your Indian Landfill Android One handsets - they're only SIXTY QUID
Cheap and deafening mobes for the subcontinental masses
Apple's SNEAKY plan: COPY ANDROID. Hello iPhone 6, Watch
Sizes, prices and all – but not for the wrist-o-puter
A SCORCHIO fatboy SSD: Samsung SSD850 PRO 3D V-NAND
4Gb/s speeds on a consumer drive, anyone?
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.