Feeds

Deduplication: a power-hungry way to streamline storage

Data wants to stay single

High performance access to file storage

Windows Server 8 is coming, and it is bringing storage enhancements with it.

Data deduplication in particular has caught my eye: it is something I have wanted on my Windows file servers for a long time.

This technology is nothing new; ZFS has had deduplication for a while now, and this technology is (experimentally) available with Linux’s Btrfs as well.

Worth consideration too is Opendedup, bringing deduplication to both Windows and Linux via SDFS.

The quick and dirty on deduplication is that it is an umbrella term for a set of technologies that allow you to store only one copy of a given piece of data on your hard drive, thus saving space and potentially speeding file writes. Essentially, it is single instance storage.

Deduplication can be done at the file level, the block level or the byte level. File and block level are the most common.

Need for speed

It can be done synchronously (as the writes happen) or asynchronously (as a scheduled job during quiet hours.)

Synchronous deduplication takes a lot of CPU power. So much power that high-end filer manufacturers are always clamouring for the fastest possible Xeons, and are pushing forward with research into making use of GPGPU technology.

It’s easy to imagine why. Try to compress 5GB of text files into a zip ball. Now, picture your hard drive as a half-petabyte zip ball that you are reading from and writing to at 10Gbit/s. Processing power is suddenly very important.

Despite this, deduplication is a critical technology. Storage demand has consistently outpaced capacity growth. What’s more, while hard drive capacity has trebled, network I/O and disk speeds have not.

This has potentially disastrous implications for both Raid rebuild times and backups. Deduplication can reduce the amount of information to Raid or backup, helping to ensure both of these processes occur in timeframes compatible with business needs.

Risky business

This is assuming that you are backing up the deduplicated blocks instead of the full file set. There are arguments for and against both.

Backing up the deduplicated blocks means less backup media is required and less bandwidth has to be set aside to perform the backups. On the other hand, it can increase restore times dramatically, as the entire set of backup media is now hopelessly interdependent.

Most people won’t back up data as deduplicated blocks – it is just too risky. The loss of one piece of backup media can render data irretrievable on all other media. This means budgeting backup bandwidth for the fully undeduplicated data to run every night.

You also have to budget your storage I/O bandwidth for the undeduplicated data size, not the size as it is stored on disk. The amount of data on disk may change only by a few dozen gigabytes a day, but the total storage I/O off that system could be measured in dozens of terabytes.

Mind the gap

Deduplication is necessary, increasingly so as the gap between storage demand and availability grows. But it doesn’t help decrease the need for network bandwidth, and it imposes a hefty processing requirement.

My next filer looks like its going to have a pair of top end Xeons and 10GBE. It will need to have two 10GbE ports, as I need to allow for MPIO.

Factor in sizing the filer to deal with demand peaks, ability to support snapshots, previous versions and other fun features, and the thought of planning my next storage refresh gives me a headache.

Difficult or no, time must be taken to do the research. The cost of storage and its attendant networking is such that few among us can afford to get it wrong. ®

High performance access to file storage

More from The Register

next story
Report: Apple seeking to raise iPhone 6 price by a HUNDRED BUCKS
'Well, that 5c experiment didn't go so well – let's try the other direction'
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Nvidia gamers hit trifecta with driver, optimizer, and mobile upgrades
Li'l Shield moves up to Android 4.4.2 KitKat, GameStream comes to notebooks
AMD unveils Godzilla's graphics card – 'the world's fastest, period'
The Radeon R9 295X2: Water-cooled, 5,632 stream processors, 11.5TFLOPS
FOUR DAYS: That's how long it took to crack Galaxy S5 fingerscanner
Sammy's newbie cooked slower than iPhone, also costs more to build
Sony battery recall as VAIO goes out with a bang, not a whimper
The perils of having Panasonic as a partner
NORKS' own smartmobe pegged as Chinese landfill Android
Fake kit in the hermit kingdom? That's just Kim Jong-un-believable!
Gimme a high S5: Samsung Galaxy S5 puts substance over style
Biometrics and kid-friendly mode in back-to-basics blockbuster
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.