Permabit's killer dedupe technology
Component dedupe for primary data storage OEMs
Fast primary data deduplication, a Holy Grail of storage vendors, is set to be a practical reality with Permabit's Albireo product, a software library that can be integrated as a component by storage and application OEMs.
Primary data deduplication has been seen as a CPU-cycle sucking monster that slows down application performance, applies to data with less redundancy than backup sets, and doesn't scale well due to the amount of potential data to be indexed in the search for duplicates. Yet the spread of virtualised multi-core servers sharing drive arrays mean primary data sets are getting larger. The opportunities to improve array storage efficiency by removing block-level duplicates are becoming more attractive as this happens.
NetApp and Nimbus data are the only storage array suppliers offering primary data deduplication, with NetApp's A-SIS in widespread use and the Nimbus Data flash-only product untried. Various ZFS-based products, such as ones from GreenBytes and a coming one from Compellent using Nexenta software, offer primary data deduplication but have yet to be tested with mainstream and critical business applications such as serving virtual desktop images (VDIs) to client desktops.
Permabit's marketing VP, Mike Ivanov, said: "One of the main drivers is NetApp implementing A-SIS. Everyone else needs to catch up ... What NetApp does now with primary dedupe is the only game in town and so widely successful ... But it's not that scalable ... only 16TB of indexing per volume. We need to scale to many hundreds of terabytes.
"ZFS is open source and puts all of hash tables put into standard filesystem metadata. It falls out of memory eventually, causing disk arrays to do multiple seeks. That's very inefficient. Putting metadata onto solid state drive (SSD) ... increases cost and still affects performance."
Deduplication without rehydration
Permabit, an archiving software supplier, has devised patented indexing technology with a low memory footprint that can function in single servers and multi-node grid systems, enabling it to scale to hundreds of terabytes of data. Conceptually it executes parallel to the data path between a host server and the storage array, and is not an inline, bump-in-the-wire appliance. The software receives a copy of data for the array or in the array, generates unique content fingerprints with a SHA-256 hash, and checks its index to see if sub-file-level block groups have been stored already, meaning they have an index entry. If they have than Albireo tells the array controller software so it can use that in its standard content metadata.
This software can then replace the duplicated blocks with a block reference. There is no need to rehydrate deduplicated data when it is next accessed because the array controller assembles blocks in response to a read request as it normally does - a block is a block is a block. If Albireo was an appliance then it would have to rehydrate data as it is the only place in the stack that knows how to find deduplicated blocks and rebuild their data.
Typically a storage array will respond to a read request by fetching blocks from disk to assemble the requested data. With Permabit's approach this is unchanged and when deduplicated blocks have to be fetched they are picked up like any other block. Ivanov said: "Block replacement is in the vendor's metadata … New stuff is added to index. Existing stuff is detected and Albireo sends a signal to the vendor's stack saying it is a known block at a known location. The vendor merges the blocks or extents and frees the allocated blocks for the deduplicated data. Different vendors do it differently and it works with their thin provisioning, whatever."
The technology works at a level below file or object storage and need not interfere at all with snapshots, replication or thin provisioning. It does not capture its own data and become a single point of failure, it not compromising data integrity.
Ivanov said: "The block interface has fixed chunks. File interface is stream of data to which we apply our segmentation engine and identify optimal boundaries for dedupe. Albireo is content-aware and can pull out images from office files and knows about tar files, things like that."
The technology can be used for inline dedupe, post-process dedupe or parallel deduplication, a combination of the two.
Scaling up and down
The amount of primary data to look at can be enormous, far larger than a single Albireo node can manage. Ivanov said: "We can provide a global index across multiple nodes and make use of memory in each node. It's transparent to the app whether we're a single local instance or distributed grid. We scale because of this multi-node scaling [and] we have self-healing capabilities.
"Petabyte scalability is key. Secondary storage deep only scales to a petabyte or more and cannot index very large amounts of unique data. … NetApp's A-SIS has a limit of 16TB per volume."
Albireo goes far beyond that. The indexing technology is the key to scaling in Ivanov's view and Albireo needs only 3.5 bytes in memory per index entry. He says it operates effectively in memory-constrained environments including SMB servers and small appliances, as well as in multi-node grid setups.
Ivanov says Albireo is fast: "It's at least ten times more efficient than anyone else. One in a thousand times we have to do a disk seek. Ninety nine per cent of index checks are in memory … We're delivering 140MB/sec per Xeon core or more … We can easily deliver thousands of MB/sec in a grid config." Watch out Data Domain.
Permabit sees this as OEM technology. In Ivanov's view: "The OEM has to do some development with our library but it ends up being far more attractive to them [than inventing it themselves or having an appliance] because we do not degrade performance or hide other functionality. The only competition is internal development."
Many OEMs have secondary data deduplication but: "Secondary data dedupe technology isn't necessarily best equipped to address primary space … dedupe is not dedupe when we target primary data market."
Ivanov said: "We're partnering up with larger and medium size storage OEMs to bring this to market. We've been working on it for six months and have some design wins. Some of them will possibly announce by the year end. We think there will be wide deployment in 2011. If your storage is not deduping in 2012 you will be at a disadvantage."
The core indexing technology was conceived by Jered Floyd, Permabit's co-founder and chief technology officer. The name Albireo comes from a star that looks like a single star to the naked eye but is revealed by a telescope to be a pair of yellow stars orbiting each other with a third and fainter blue companion. It's an analogy for deduplication: having multiple objects appear as one.
Will Albireo become a star? That depends upon a large number of storage OEMs taking it up, meaning companies such as 3PAR, Dell, EMC, HDS, HP, IBM, LSI, Pillar and so forth. Compellent is using Nexenta's ZFS and NetApp has A-SIS, so they're not on Permabit's list. It appears from our checks that 3PAR has no immediate plans to use Albireo. But would only take an EMC or HP or IBM to embrace Albireo for the floodgates to open with Permabit then becoming an effective industry standard.
If OEMs do adopt Albireo then as well as clawing back NetApp's lead it can be used for secondary data deduplication. That represents a threat to stand-alone deduplication vendors such as Exagrid, FalconStore, Ocarina and Sepaton.
Ocarina has existing OEM deals with BlueArc, HDS and HP. It is also pursuing an embedded strategy, and is launching an embedded product relatively soon. The company has multiple OEM contracts concerning this technology and many new products are under development with these OEMs. Ocarina will be talking about these partnerships and products in the next few weeks.
The prospects are terrific and Permabit is really hyped up over its new technology. Industry stardom, glamour and a future trouser pocket-filling IPO all beckon. Albireo sure is seeing stars, but will it become one? ®