Feeds

Delete all you like, but it won't free up space

You've been (de)duped ...

Internet Security Threat Report 2014

Comment: Networker blog author Preston de Guise has pointed out a simple and inescapable fact: deleting files on a deduplicated storage volume may not free up any space.

De Guise points out that, in un-deduplicated storage: "There is a 1:1 mapping between amount of data deleted and amount of space reclaimed." Also, space reclamation is near-instantaneous. With deduplication neither need be true.

Huh? Think about it. You add files to a deduplicated volume and any blocks of data in them that are identical to existing stored block groups get deduplicated out of existence and replaced by pointers. The file shrinks. This carries on as more files are added. The drive's capacity gets used up. You become aware of this. You start deleting files to reclaim space. You may find that much of the deleted files' originally fat content is actually skinny pointers and you just reclaim a few bytes of space instead of megabytes or terabytes. Oops; you just got stuffed by deduplication.

Space reclamation with dedupe also requires the dedupe function to do some scanning once a file is deleted:

Whenever data is deleted from a deduplication system, the system must scan remaining data to see if there are any dependencies. Only if the data deleted was completely unique will it actually be reclaimed in earnest; otherwise all that happens is that pointers to unique data are cleared. (It may be that the only space you get back is the equivalent of what you’d pull back from a Unix filesystem when you delete a symbolic link.)

Not only that, reclamation is rarely run on a continuous basis on deduplication systems – instead, you either have to wait for the next scheduled process, or manually force it to start.

His conclusion is this:

The net lesson? Eternal vigilance! It’s not enough to monitor and start to intervene when there’s say, 5 per cent of capacity remaining. Depending on the deduplication system, you may find that 5 per cent remaining space is so critically low that space reclamation becomes a complete nightmare.

He recommends the use of "alerts, processes and procedures targeting" a set of capacity utilisation levels such as 60 per cent, 70 per cent, 75 per cent and so on.

Great idea. Preston de Guise is a clever guy. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.