Original URL: http://www.theregister.co.uk/2011/03/18/tape_in_the_cloud/

How Google taped up its email outage wounds

Is there still a role for the reels ...

By Chris Mellor

Posted in Cloud, 18th March 2011 10:35 GMT

Comment Does tape have a role in cloud computing?

Ask cloud evangelists that question and they sit back, purse their lips, and say, "No, of course not ... but ..." The thing is they tend to come from disk storage-biased suppliers or consultancies and are in love with virtualisation, the placing of abstraction layers between server apps and hardware and between server apps and disk storage hardware.

Yet they have to admit tape storage is cheap, long-lasting and reliable, more so than deduplicated disk drive arrays pretending to be tape libraries. The financial numbers go in tape's favour but the emotional attachment to disk products works against it.

Arch cloud IT supplier Google found it necessary to rely on tape archival backup when there was a Gmail outage in late February. Google's Ben Traynor, an engineering VP, blogged about this and talked about software bugs affecting "several copies of the data". In other words, trashed data was "snapshotted", replicated etc, propagating the original fault.

Fortunately for users: "To protect your information from these unusual bugs, we also back it up to tape. Since the tapes are offline, they’re protected from such software bugs."

Well yes, of course, we all know that. There was also a dig at the speed of restoring from tape: "But restoring data from them also takes longer than transferring your requests to another data centre, which is why it's taken us hours to get the email back instead of milliseconds."

The basic story here is that tape archives got cloud evangelist Google out of a hole. Without tapes, never mind the restore speed issue, there would have been nothing to restore, and users would have lost email data.

Tape is the archive backstop for lost or duff data on disk, with a 30-year lifespan. It's also a lot more cost-effective than disk for such use, a point repeated again and again by tape automation vendors.

What do tape automation vendors HP, Quantum and SpectraLogic say about cloud use of tape?

SpectraLogic

SpectraLogic VP of product management and marketing, Molly Rector, says: "Spectra Logic views cloud providers as customers, not competitors. Tape will be the strong, silent partner to the cloud – very much present and in use, just completely transparent to the end-user. "

What markets in the cloud does she see for tape?

Public Clouds are most likely to be utilised by SMBs (small and medium businesses), primarily for economic reasons. Because of this imperative on public cloud providers to keep costs to a minimum, tape is likely to be the largest storage repository in these offerings because of the significantly lower cost compared to disk.

Hybrid Clouds are an interesting proposition [but] we don’t expect to see mid-market and enterprise customers adopting it. This approach may catch on at the lower end of the market where the financial benefits may outweigh concerns about regulation or security.

Private clouds are just another name for modern internal data centres. Regardless of the effect that server virtualisation, virtualised tier 1 storage and even network virtualisation has on the makeup of a data centre, backup and archiving are still major imperatives, and ones in which tape has an integral role to play.

Rector also provided some colour on the cost advantages of tape versus disk:

At the exabyte level, data deduplication may provide a 90 per cent reduction in total storage, but the annual costs of running 100TB of deduplicated storage is still going to be in the tens to hundreds of thousands of dollars just for heating and cooling annually.

That same exabyte’s worth of data can sit on idle tape cartridges and consume absolutely no power for the tape itself unless the data needs to be accessed, and very little power to maintain, monitor and cool the tape system as a whole.

We then asked Rector if tape should be used for backup or archive. She replied:

Tape and disk have different strengths in terms of speeds, capacities, and access methods, so depending on the configuration, disk or tape may be faster. Restoring a system is typically associated with backup and not archive, and tape is very fast at streaming that data back to the system. Spectra’s view is that tape is the right choice for archiving; and that disk is typically better suited for backup.

With a nod to the Gmail outage, she continued:

Tape also provides a cloud service provider with an added layer of security, as an offline copy of data is the only copy that is 100 per cent safe from a malicious attack [or self-imposed software update data corruption.]

In the light of the Gmail outage, that point should resonate strongly. Last point from Spectra. If customers' cloud data is on standard format tape, then, if they want to change cloud service provider or exit the cloud, they can have the tapes shipped to them. That's just not feasible with disk arrays.

HP puts in its two cents

As one of the main tape automation format developers with DAT in its locker, plus LTO Consortium membership – and its position as a strong cloud evangelist – HP's views on tape in the cloud should be interesting.

Here is its starting position:

Cloud service providers should be aware that LTO tape technology is expanding its role from a pure backup solution to that of a premier long-term storage technology and archive.

The company echos the Spectra view about the respective roles of tape and disk in backup and archiving. Asked about the relative costs of tape and disk, HP said:

  At approximately 6 cents per Gbyte (native) for LTO-5 tape media,  LTO Tape offers one of the lowest costs per Gbyte for long-term storage, particularly when factoring in energy costs. Tape storage is cool – literally – and has been shown to decrease storage power requirements by 99 per cent when compared with disk-based storage. Tape helps meet the goal of many data centres that inactive data should not consume energy.

Data archived on tape does not require the data centre floor space, power or cooling that's required for data stored on disk ... The Clipper Group Inc studied the total cost of ownership (TCO) of using disk or tape to archive large binary files ... [and] concluded that:

  • Disk is more than 15 times more expensive than tape for archival.
  • Disk uses 238 times more energy - costing more than the total cost of the tape solution.
The Clipper Group said: "In every dimension, the TCO of the tape solution was found to be less expensive than the TCO of the disk solution for long-term data retention, especially for energy consumption, where disk consumes 238 times as much energy as tape under assumptions that lean toward favouring disk.  For most uses, we believe that the vast majority of archived data should reside on tape."

HP provided an example that illustrates how tape restore speed can be faster than disk restore speed:

When writing/recovering large quantities of archive data, tape's streaming rates (280MB/sec in the case of LTO-5) give tape a performance advantage over disk (SATA 3Gbit/s can support sustained throughput rates of 250-260MB/sec).

Quantum

Gabriel Chaher, senior director of international product and field marketing for Quantum, agreed with HP and Spectra on tape's cloudy role, and notes that, generally, "tape is part of a tiered storage solution."

Chaher addresses deduplication as an issue, citing a Clipper Group report, and saying:

[The] average disk storage is about 15 times the cost of average tape storage. The assumptions in the report do not account for a difference resulting from deduplication (for archive) because of the following reasons:  
1 The nature (less dedupable) of target archival data.

2 The fact that any deduplicating is getting done upstream of the final archiving tier.

3 There would be a need to preserve the deduplication engine to be able to reconstruct the data, which may be stored for very long periods of time.

If it were to become practical to store deduplicated data on tape – say via a tape library with an integrated deduplication front end server – then we would expect tape's capacity to substantially increase, further improving its costs relative to disk, but we would lose the portability associated with standard tape formats, unless, and this is a huge "unless", the reduplication was done in a standard way.

Perhaps the three members of the LTO Consortium: HP, IBM and Quantum could devise a way for this to happen?

On the marketing front, Chaher has this idea:

Maybe tape storage is a lower cost alternative that cloud providers could pass on to their customers. (Example: Pay $100 / TB / month for "fast access storage", but pay $75 / TB / month for "slower access storage").

Quantum says it has "several online backup provider customers using our tape products but unfortunately we're not able to disclose which ones just yet."

Where does this leave us?

Tape is the final rescue for cloud disk storage screw-ups, like that of Gmail. There can realistically be no argument about this; it worked for Google when there was no alternative.

The financial arguments in favour of tape versus disk for long-term storage seem strong and sustainable. Tape capacities, like disk capacities, are increasing and so tape's relative advantage should be sustainable.

The use of tape reels in the cloud is already reality, and one would think that the Google Gmail incident will ensure its continued and growing take-up. The front-end backup storage tier will be disk for fast access, but the archive tier – where data is infrequently accessed – should be tape: for cost, off-site protection and energy use reasons. ®