Ocarina making dedupe music with BlueArc
As well as appearing in a Legend of Zelda game
Hardware-accelerated file storage supplier BlueArc is to sell Ocarina deduplication hardware integrated with its Titan 3000 product.
BlueArc's Titan 3000 is a network-attached storage (NAS) product offering very fast access, up to 4PB of capacity, and tiered storage embracing fast Fibre Channel drives, bulk storage SATA drives, and WORM (Write Once Read Many) drives.
Ocarina is a startup producing post-process, block-level, deduplication technology that has the unique capability of reducing the size of audio and image files using proprietary mathamatical algorithms. Standard deduplication products from suppliers such as Data Domain, Sepaton, FalconStor, Quantum and others cannot dedupe JPEG and MPEG files as Ocarina's file-type specific technology can.
A BlueArc customer will take files from the primary or upper tiers of the Titan and, using BlueArc's lengthily-named Data Migrator with External Cross Volume Links, move files on a policy basis to the Ocarina Optimizer for BlueArc hardware appliance. It will dedupe the files and pass them back to the Titan for storage in a lower tier. They can be restored up the tiers by BlueArc's Dynamic Read Caching which, BlueArc claims, eliminates data latency as users access the data.
It also claims that this will increase Titan's effective capacity up to ten times more than the nearest competitive primary data dedupe offering, thought to be NetApp with its ASIS technology.
BlueArc hopes to find customers for this in the oil and gas discovery, media and entertainment, life sciences, visual effects, and Internet services markets.
Hitachi Data Systems has a reselling agreement with BlueArc and may well take this OEM addition to BlueArc's product line. Ocarina also has agreements with Isilon and HP for its technology.
BlueArc will make the Ocarina Optimizer for BlueArc available from mid-May onwards. ®
How is this supposed to work?
I thought the reason other dedupe offerings don't offer much benefit for media files (JPG, MPG, etc) is because these formats don't usually contain any redundant information. How can you dedupe data that already contains little or no duplication?
Of course, any dedupe product should be able to optimise multiple copies of the same media file (assuming they are verbatim copies) but if the files are different, and due to their already optimised (compressed, perhaps lossy) formats, don't contain any duplication - how can this work? I'm not saying it doesn't work, just that I'd like some details on how?
And do you really want to dedupe media files anyway? They're the very definition of "streaming media", and are usually written and read sequentially (ie. high speed). Surely any dedupe process will, by definition, reorder the underlying block structure and "de-sequentialise" the block layout?
ROFL - I just watched the "tech demo". I'm still in shock. Goodluck with this one... :)
PS/ Chris C - I think you've misunderstood how dedupe works. There should be no increased risk due to multiple files sharing the same blocks, If one file is "lost" then only that file is affected, and the other 99 files sharing the same blocks are still intact. If however, a disk corruption destroys a "block" of data, and that block was used by 100 deduplicated files, then "Yes" you are in a world of pain. That's why you only run dedupe on systems with niceties such as dual-parity RAID, media and parity error detection, lost write protection, etc.
Reducing size of AV?
Ocarina claims that this is a lossless format, but I could find no technical information doing a quick search. They simply say that they "optimize" based on the type of content, with "initial space savings range from 40% for complex image files to well over 70% for common office file mixes." Of course, they don't qualify that by saying how much data (and thus how much de-dupe) was used to generate those numbers, or any other details for that matter.
They do specifically mention that they break a file down into the file level, object level, and chunk level, and then "optimize" and "remove redundant information" from each of those levels. Without seeing any details or any tests, it does prompt the question -- is the reconstituted file an exact duplicate of its original form, or is it a restructured file which may *appear* to be the same (for example, the same internal components, but rearranged within the file without affecting the overall data)? The latter may not be bad for some people, but I would most definitely want an exact duplicate of what I put into the system.
And I have to say, I still don't understand this big push for de-duplication. I understand the desire to reduce the size of data sets, but de-dupe involves significant processing power, and it massively increases your risk. If you're running a standard, non-duped system and you lose a file, you've lost one file. If you're running an "optimized" duped system, and you lose the file that includes a portion of data deduped from 100 files, you've just lost 100 files. Yes, we all know about proper backups, RAID, etc. But de-dupe seems like too much of a risk to me.