IBM demonstrates dedication to deduplication replication
Diligently developing Diligent
IBM is adding replication to its ProtecTIER deduplication product, which it acquired by buying Diligent in April 2008.
ProtecTIER products will be able to replicate deduplicated data to a remote site and thus reduce any needs customers might have to ship tapes to a remote site for disaster recovery. Transmitting deduplicated files means far fewer bits need to cross a link, lowering bandwidth needs and, hopefully, costs.
IBM argues that this cost reduction means replication is not just viable for tier-one applications, but for all applications. The company thinks deduplication will become ubiquitous in data centres around the globe in the next five years.
The addition builds on the introduction of a ProtecTIER deduplication appliance in February this year.
ProtecTIER replication will be available from September 4 2009 as a separately priced option. The pricing was not revealed. There will be a software upgrade for existing TS7650G ProtecTIER Deduplication Gateway and TS7650 ProtecTIER Deduplication Appliance configurations. ®
That was my point, really.
Diligent (Protectier) does >900MB/sec dedupe today, off two boxes of commodity hardware clustered together. It scales hugely (1PB, off the shelf) because it's less limited by memory scaling problems than "traditional" (if there is such a thing) hash-based dedupe algorithms are.
Yes, ZFS will do dedupe for free (if you consider storage I/O, processor and RAM to be free). Diligent isn't free, but it's more effective than the mooted ZFS dedupe will be anyhow.
Forgive the slightly combatative way of asking the question, I've just finished writing a whitepaper on this stuff, and comparing hash-based dedupe to diligent's fingerprinting approach is sort of like comparing the ark to the Ark royal.
I have no clue. ZFS dedup has been recently announced on a talk. That's all Ive heard. I think that dedup has been integrated into the ZFS code now.
But I do know that the more drives you use, the higher the bandwidth. If you use 46 SATA 7200 rpm drives, you reach 2-3 GB/sec read speeds. That is >900MB/sec. But I dont know how dedup will affect that. I guess if you have a fast enough CPU it should be no problem, as ZFS uses no hardware raid controller cards. Everything is done on the CPU.
Question: Does ZFS offer memory-based inline deduplication for free at >900MB/sec? Is it hash based?