Tape and dedupe: So not happening
Opinion Let's be provocative: the LTO Consortium has innovated in tape format capacity but elsewhere the tape world is moribund, mostly seeing tape as a cheap data tub with ever-increasing capacity. Where is the innovation we've seen in disk drive arrays?
Yes, we have tape media checking to ensure tape cartridges are storing data correctly, and dual robotics are entering the library product area. That's pretty much it.
A tape library is a huge great data tub with backup or pseudo-backup apps pouring data in as if through taps - just as they have done for years and years.
The only thing that's changed is that the tub has got bigger and its contents are held more reliably. "Big deal? Not really" the disk array vendors might say, sitting pretty atop innovative technology mountains built from striping data across multiple spindles, deduplication, RAID and tiering.
Tape can't only be used in one and only one way by some kind of technological holy writ.
A RAID-like tape setup was once tried. It was called RAIT (Redundant Array of Independent Tapes) and involved multiple tape drives reading and writing simultaneously to compensate for low tape speeds. The management software was tricky to build and the tapes in the set used by the drives had to be managed as a set. It never caught on and tape speeds grew faster.
Now it seems that tape format capacities are increasing faster than tape speed and it takes longer and longer to write or read each new tape format. The I/O density of tape is worsening and nothing is being done about it, unless we count LTFS as a way of getting specific data off tape faster.
Disk drive arrays have tiering with the access rating of data - hot-medium-cold - dictating pretty much the speed of the storage tier it is kept on. There are placement software technologies to move data between the tiers, such as EMC's FAST and Compellent's Data Progression.
Tape has nothing like this, unless you count having various generations of a format in a library as tiering. But the format is not positioned in that sense and there's no automated movement of data between, as we might say, LTO-6, LTO-5 and LTO-4 "tiers".
A tape library is a flat, single tier storage space, an un-deduplicated space. Why?
Tape and deduplication
Data deduplication is a technology that would allow tape to trounce disk on data storage costs.
Let's take a 1.4TB LTO-4 tape format and assume a fairly conservative 20:1 dedupe ratio - it is archive data - which gets us to an effective capacity of 28TB. That's like turning a tape library into Doctor Who's Tardis, unimaginably larger inside than it looks from the outside.
Even if the costs of archiving on disk is the same as on tape, a highly debatable proposition - say one cent per MB for argument's sake - deduping tape gets the tape cost down to one twentieth of that, or one cent per 20MB. It blows disk away on the costs.
Why haven't more vendors followed CommVault in putting deduped data on tape? Is it technically too hard?
Here's Quantum CTO Mark Himelstein's view: "Putting dedupe data on tape is possible ... LTO has multiple channels [and] you can partition the tape ... It's what you need. It is a reasonable idea ... The data [space] on the tape can be sufficient to be context-inclusive [referring to the dedupe metadata]. We do this already today with our DXi replication technology."
So if it is possible, why hasn't it been done? Is it a speed thing? Does having the dedupe metadata on tape slow down the rehydration of data? So, take advantage of tiering and have a few disk drives in the library holding the metadata? Or be really bold and hold it in flash.
Is it because have deduped data on tape makes that tape into a proprietary format and customers get locked in? For a 20X increase in tape capacity with no cost increase customers would surely look positively on that kind of lock-in.
Steve Mackey, SpectraLogic's sales veep for Europe and Africa, says: "The issue of dedupe is recovery. You've got to recover the whole tape or a set of tapes before you can recover a file. The big users of archive are looking to recover the data. Today I don't believe dedupe on tape meets the requirements for recovery."
Tape is a venerable and long-lived and stable technology but that's no reason it should stay that way. It's just stored bits in a recording medium; tape can't only be used in one and only one way by some kind of technological holy writ.
Is tape moribund?
We're told that cloud storage will scale up to petabytes, and even exabytes, and that archives will have to be in the cloud because that's going to be the cheapest, most reliable, and easiest-to-manage place for it. So, cloud service providers, how would you like a 20X or even 30X increase in tape library capacity for no increase in tape media costs? Would that increase your profitability?
Good idea? You betcha, but don't hold your breath.
Integrated IT stacks
How about having integrated IT stacks including applications, compute, networking and storage, meaning disk storage and tape storage? How about if Oracle gets its hardware and software working better together by hooking up StreamLine tape libraries as the archival storage tier in an integrated application soup to tape nuts data centre infrastructure stack?
Suppose Oracle hooked up StreamLine libraries via InfiniBand to its drive arrays and transferred deduped data from the arrays to the StreamLIne with data-moving software it developed itself, and not that archaic technology known as backup software?
We're re-inventing the mainframe using commodity hardware and clever software, so why not bring in tape, which was a popular mainframe storage technology?
Who is better positioned than Oracle? SpectraLogic can't; it's a best of breed tape library supplier, like Quantum. HP could, in theory, but its biggest library is an OEMed version of Quantum's i6000 - hardly Premier League.
IBM could but does it have the vision? I think not. Aggressive, punchy, driven Oracle most probably does.
So was Overland Storage CTO, the fertile and inventive Geoff Barrall, right when he said tape was ripe for innovation?
Mackey thinks so: "With the resurgence of interest in tape as a long term archive I think there'll be a resurgence in innovation."
Come on, tape industry guys: get your fingers out and get this medium moving again. It can blow disk away as an archive storage medium and disk can't respond on a cost basis. So, what are you waiting for? Do you want Oracle to get the jump on you? ®