Dell's dedupe story still unfolding
Ocarina rocket scientists look to crack the block
Comment Dell's spreading of Ocarina dedupe goodness across its storage platforms has to overcome several obstacles, none of which are show-stoppers.
Ocarina deduplicates files – or more accurately, optimises and compresses files – and is content-aware so that it can work its specific magic on JPEGs and cast different spells on PACS images. It does not dedupe blocks and therein lies the rub.
Dell wants to layer its Ocarina dedupe across its storage estate at both file and block level, and not exclude the Windows server-based DL disk-to-disk backup systems that run either CommVault or Symantec software. Darren Thomas, Dell storage VP and general manager, is confident this is all possible.
The file need is easiest, being Ocarina's home ground so to speak. Dell has already announced its Dell Scalable Filesystem (DSFS) for PowerVault with the NX3500, and for EqualLogic with the FS7500, in the form of NAS heads. One is coming for Compellent and another for the DX6000, the object storage box. Dell will add the Ocarina dedupe technology to that box and it immediately becomes available for all the storage platforms under the DSFS head; the timing is the same.
Thomas said: "Think of it like a RAID feature. It's a dedupe feature for a file system."
That does not include the DL arrays though. Dell understands that it is desirable that data, file data, that is being backed up from DSFS-headed arrays does not have to be rehydrated before being backed to the CommVault or Symantec DL products, but neither supplier's software supports Ocarina.
We understand that CommVault is thinking about how to import Ocarina-ised information and avoid the re-hydration. Symantec, according to our research, understands the desirability of avoiding the rehydration but is not so far advanced as CommVault in talking to Dell about the issue.
Darren Thomas said Dell represents 20 to 25 per cent of CommVault's revenues and we can imagine that this gives Dell people significant mindshare in, and fast-access to, senior people in CommVault. Such access is not so attainable with the much larger Symantec.
Darren Thomas said there is also a need for these two DL software suppliers to be able to export (restore) the backed-up data to other third-party systems, which are enabled to read the Ocarina-ised data. These systems would need to have Ocarina Reader software, which is a relatively small piece of code, as well as being well within Dell's power to distribute.
Blocks are different. Blocks are difficult. The size of groups of blocks, chunks or pages, varies between the Dell storage platforms. On EqualLogic systems a block is 15MB whereas on Compellent's it varies and the 64-bit StorageCenter O/S will track at the block level. A block is not a complete file, although the storage O/S can in principle be queried about which blocks makeup which files. Having files striped across drives increases the fragmentation a block-level deduper of primary storage has to deal with.
The larger the page or chunk size the higher the probability of finding duplicated data within it.
Dell has its Ocarina dedupe rocket scientists working on this. These are the people who developed the original Ocarina algorithms for compressing data other deduplication technologies could not touch, the various image file formats for example. They are developing algorithms to find and remove duplicated data in the pages or chunks, and also to recover the released space. It's no good rewriting the 15MB page with 3MB of empty space in it. Darren Thomas said: "If you compress 15MB of data to 12MB then you have to recover the space. Maybe this will mean concatenating compressed pages."
As we understand it, you would read in pages, dedupe them, and then write them back to the disks as a continuous stream with the array software breaking up this stream up into pages again.
Once the deduplication detection and space recovery algorithms have been created, Thomas said: "We'll build it into the operating systems of EqualLogic and Compellent. At that point they become separate pieces of work."
The time frame for this effort is not clear-cut. Dell is confident that it can achieve the result it wants, and we think we should start seeing results in 12 months or so, if not before. El Reg thinks the file level Ocarina dedupe could start showing up by the end of the year.
You get the feeling Dell is very happy to have put the old days behind it, when people were asking if Dell was really an innovative company. It has a substantial chunk of its own IP and is energetically developing it. Soon no doubt, we'll be hearing about Dell's patent portfolio, and the research scientists at Ocarina will have contributed their part to it. ®
Sponsored: What next after Netezza?