The Register® — Biting the hand that feeds IT

Comments on: HP duplicates deduplication

Can anyone explain..... 

Posted Tuesday 24th June 2008 05:20 GMT

"Deduplication strips out redundant data from files at the sub-file level. HP is claiming it can provide a reduction in backup file size of up to 50:1......."

Is this the same as lossless data compression (ZIP etc) or is it something new/extra/special?

"HP states that the data deduplication technology for the VLS and D2D enables customers to automate and remotely manage the systems with low-bandwidth replication. This provides data center managers the ability to back up data remotely without manual intervention, thereby reducing staffing costs."

Is this the same as saying 'It's hard drive storage so there's no need for on-site staff to take care of loading tapes." ? If so, why didn't they just say that?

Isn't this just called "stream compression" - like zip/lzh/etc? 

Posted Tuesday 24th June 2008 10:53 GMT

LZH and friends are a family of "data deduplication algorithms"; huffman coding and all that jazz are based on well established principles of information theory, which is based around "de-duplicating" repeated data.

So, HP have a tape compression system. What's new about that exactly, apart from the name?

ZIP vs tar.gz 

Posted Tuesday 24th June 2008 12:12 GMT

Boffin

I have file collections that are 10 times smaller as tar.gz than as .zip

The difference is that zip compresses each file then starts fresh on the next one while tar creates a huge file with all of the contents of the sub files and then condenses the entropy in that huge file, which lets it take advantage of repeats between files and even in the file headers under tar.

ManFromMars Alert! 

Posted Tuesday 24th June 2008 13:34 GMT

Alien

No, wait... it's not in the comments section, it's the start of the article. Ow!

Don’t Miss

Mouse teaserOpenOffice.org pushes gamers' buttons with OOMouse

Retains 'burning hatred' for Microsoft, not Apple

Windows VistaWindows 7 kills two thirds of active Vista initiatives

Tech Panel results Fresh insights into desktop modernisation

Intel logo teaserBig Iron, big data, big networks, big problems

Interview Intel's Wilf Pinfold talks us through SC09

HP LogoHP scores SMB storage hat-trick

Disk, DAT and the other