Magic hash maths: Dedupe does not have to mean high compute. Wait, what?

X-IO maths man claims it can minimise mill hash work with buckets of blooms

Thu 12 Oct 2017 // 09:06 UTC

Analysis A new and deduping X-IO ISE 900 all-flash array has puzzling puny processors yet kicks out good performance when deduping.

We wondered about that, and were pointed to a video of X-IO chief scientist Richard Lary presenting at a Storage Field Day in Denver earlier this year.

The maths is complex but the points made are logical and show how X-IO's way of dedupe means you don't need such powerful processors as are commonly found in other deduping arrays.

Improving Deduplication via Mathematics with Richard Lary via Vimeo.

In the 37.5 minute video, Lary says dedupe is computationally intensive but there are ways to make it less so without losing much in the way of deduplication efficiency, and so leaving more processor cycles for running data access or other code.

In deduplicating, you calculate a computationally intensive and unique mathematical hash or signature of an incoming lump of data and compare to a table of existing hashes. If there is a match it is a duplicate lump and can be replaced with a small reference to that hash, thus saving disk or flash space.

If there isn't a match then it's unique data and needs to be stored with its hash added to the table. A system could have 10⁹ to 10¹¹ entries in its dedupe hash table.

Lary says deduplication is a drive space optimisation that incurs a performance penalty. An implementation can trade off space efficiency for a lower performance penalty.

How? He goes on to talk about proxy signatures, non-crypto-signatures, bucketed hash tables, well-sized Bloom Filters, an array of small blooms in a bouquet rather than a bucket, and then a bucket of blooms.

Basically his method calculates less computationally intensive hashes and performs less processing work in deciding whether they are unique or not. Deduplication space optimisation is compromised a little, but performance goes up significantly.

He also says deduplication systems waste a lot of resources cataloguing unique user data that will be overwritten in the near future. X-IO says it has a technique to short-cut this wasted effort but doesn't want to talk about it yet, meaning X-IO's dedupe processing burden will get smaller still.

Topics

Special Features

Vendor Voice

Resources

Storage

Magic hash maths: Dedupe does not have to mean high compute. Wait, what?

X-IO maths man claims it can minimise mill hash work with buckets of blooms

More about

TIP US OFF

Other stories you might like

NASA will send astronauts to patch up leaky ISS telescope

185K people's sensitive data in the pits after ransomware raid on Cherry Health

Admin alert: Copilot app lands on Windows Server 2022

Protecting distributed branch office environments from ransomware

Micron scores $6.1B CHIPS Act cash for New York and Idaho fabs

Google laying off staff again and moving some roles to 'hubs,' freeing up cash for AI investments

EU tells Meta it can't paywall privacy

Novelty flip phone strips out almost every feature possible to be as boring as possible

Prolific phishing-made-easy emporium LabHost knocked offline in cyber-cop op

Debian spices up APT package manager with a dash of color, squishes ancient bug

AI PCs are here but a killer application for biz users? Nope

Valkey publishes release candidate and attracts new backer

About Us

Our Websites

Your Privacy