Tape and dedupe: So not happening

Why not?

Top three mobile application threats

Opinion Let's be provocative: the LTO Consortium has innovated in tape format capacity but elsewhere the tape world is moribund, mostly seeing tape as a cheap data tub with ever-increasing capacity. Where is the innovation we've seen in disk drive arrays?

Yes, we have tape media checking to ensure tape cartridges are storing data correctly, and dual robotics are entering the library product area. That's pretty much it.

A tape library is a huge great data tub with backup or pseudo-backup apps pouring data in as if through taps - just as they have done for years and years.

The only thing that's changed is that the tub has got bigger and its contents are held more reliably. "Big deal? Not really" the disk array vendors might say, sitting pretty atop innovative technology mountains built from striping data across multiple spindles, deduplication, RAID and tiering.

Tape can't only be used in one and only one way by some kind of technological holy writ.

A RAID-like tape setup was once tried. It was called RAIT (Redundant Array of Independent Tapes) and involved multiple tape drives reading and writing simultaneously to compensate for low tape speeds. The management software was tricky to build and the tapes in the set used by the drives had to be managed as a set. It never caught on and tape speeds grew faster.

Now it seems that tape format capacities are increasing faster than tape speed and it takes longer and longer to write or read each new tape format. The I/O density of tape is worsening and nothing is being done about it, unless we count LTFS as a way of getting specific data off tape faster.

Disk drive arrays have tiering with the access rating of data - hot-medium-cold - dictating pretty much the speed of the storage tier it is kept on. There are placement software technologies to move data between the tiers, such as EMC's FAST and Compellent's Data Progression.

Tape has nothing like this, unless you count having various generations of a format in a library as tiering. But the format is not positioned in that sense and there's no automated movement of data between, as we might say, LTO-6, LTO-5 and LTO-4 "tiers".

A tape library is a flat, single tier storage space, an un-deduplicated space. Why?

Tape and deduplication

Data deduplication is a technology that would allow tape to trounce disk on data storage costs.

Let's take a 1.4TB LTO-4 tape format and assume a fairly conservative 20:1 dedupe ratio - it is archive data - which gets us to an effective capacity of 28TB. That's like turning a tape library into Doctor Who's Tardis, unimaginably larger inside than it looks from the outside.

Even if the costs of archiving on disk is the same as on tape, a highly debatable proposition - say one cent per MB for argument's sake - deduping tape gets the tape cost down to one twentieth of that, or one cent per 20MB. It blows disk away on the costs.

Why haven't more vendors followed CommVault in putting deduped data on tape? Is it technically too hard?

Here's Quantum CTO Mark Himelstein's view: "Putting dedupe data on tape is possible ... LTO has multiple channels [and] you can partition the tape ... It's what you need. It is a reasonable idea ... The data [space] on the tape can be sufficient to be context-inclusive [referring to the dedupe metadata]. We do this already today with our DXi replication technology."

So if it is possible, why hasn't it been done? Is it a speed thing? Does having the dedupe metadata on tape slow down the rehydration of data? So, take advantage of tiering and have a few disk drives in the library holding the metadata? Or be really bold and hold it in flash.

Is it because have deduped data on tape makes that tape into a proprietary format and customers get locked in? For a 20X increase in tape capacity with no cost increase customers would surely look positively on that kind of lock-in.

Steve Mackey, SpectraLogic's sales veep for Europe and Africa, says: "The issue of dedupe is recovery. You've got to recover the whole tape or a set of tapes before you can recover a file. The big users of archive are looking to recover the data. Today I don't believe dedupe on tape meets the requirements for recovery."

Tape is a venerable and long-lived and stable technology but that's no reason it should stay that way. It's just stored bits in a recording medium; tape can't only be used in one and only one way by some kind of technological holy writ.

Tangled magnetic tape

Is tape moribund?

We're told that cloud storage will scale up to petabytes, and even exabytes, and that archives will have to be in the cloud because that's going to be the cheapest, most reliable, and easiest-to-manage place for it. So, cloud service providers, how would you like a 20X or even 30X increase in tape library capacity for no increase in tape media costs? Would that increase your profitability?

Good idea? You betcha, but don't hold your breath.

Integrated IT stacks

How about having integrated IT stacks including applications, compute, networking and storage, meaning disk storage and tape storage? How about if Oracle gets its hardware and software working better together by hooking up StreamLine tape libraries as the archival storage tier in an integrated application soup to tape nuts data centre infrastructure stack?

Suppose Oracle hooked up StreamLine libraries via InfiniBand to its drive arrays and transferred deduped data from the arrays to the StreamLIne with data-moving software it developed itself, and not that archaic technology known as backup software?

We're re-inventing the mainframe using commodity hardware and clever software, so why not bring in tape, which was a popular mainframe storage technology?

Who is better positioned than Oracle? SpectraLogic can't; it's a best of breed tape library supplier, like Quantum. HP could, in theory, but its biggest library is an OEMed version of Quantum's i6000 - hardly Premier League.

IBM could but does it have the vision? I think not. Aggressive, punchy, driven Oracle most probably does.

So was Overland Storage CTO, the fertile and inventive Geoff Barrall, right when he said tape was ripe for innovation?

Mackey thinks so: "With the resurgence of interest in tape as a long term archive I think there'll be a resurgence in innovation."

Come on, tape industry guys: get your fingers out and get this medium moving again. It can blow disk away as an archive storage medium and disk can't respond on a cost basis. So, what are you waiting for? Do you want Oracle to get the jump on you? ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Brit boffins use TARDIS to re-route data flows through time and space
'Traffic Assignment and Retiming Dynamics with Inherent Stability' algo can save ISPs big bucks
Microsoft's Nadella: SQL Server 2014 means we're all about data
Adds new big data tools in quest for 'ambient intelligence'
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.