Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Comment Tape has spent some time on the ropes, but now it's back in the ring. After suffering five or more years of onslaught from pro-disk fanatics drunk on disk deduplication technologies, reality has struck home. Tape is cheaper than disk*. Tape is more reliable than disk and, the killer, tape's storage capacity can go on increasing for years.
In fact, IBM is currently preparing to demonstrate a near-40X improvement on LTO-6 capacity, while the rate of disk capacity increase is much slower.
Tape is the only storage medium that can confidently say it can keep up with the growing amounts transactions, images, music, plus compliance needs, social media conversations and machine-to-machine messages that are being thrown at our IT systems. Sure, store it on flash and disk when it's hot but bleed it off to tape when it's old and cold and can't be thrown away.
Tape, once the backup medium, is now reverting to an archive role. At the same time, the amount of data being archived is shooting up as data-generation rates increase.
There used to be more than a dozen tape and spool formats, and various methods of writing data to tape, but now there are just three main ones: from IBM, LTO and Oracle, with a fourth, HP's DAT, that is fading.
IBM's proprietary TS1140 format stores 4TB of raw data on a tape reel and Big Blue also makes a range of tape libraries to use the stuff, including its high-end TS3500 with 15,000 slots for cartridges, robots to pick the cartridges and slip them into drives, and a total capacity of compressed data in excess of 2.7 exabytes. With mainframe use of tape there is a near captive market for IBM although Oracle, with its acquired Sun/StorageTek technology plays a strong role here too.
Oracle T10000 drive
Oracle's is the second surviving proprietary tape format. Its T10000d holds 8.5TB of raw data, and, like IBM, Oracle makes a range of libraries with its StreamLine 8500 at the high end. This holds 10,000+ tapes in a single system and more than 100,000 in 10 linked systems. Its most famous customer for this is now Google.
LTO, the Linear Tape Open consortium, was an invention of HP, IBM and Seagate, with the first sale by IBM in August 2000.
The idea was to create an open tape format for Windows and Unix servers with media and drive products from and interchangeable between the three consortium members. The combination of this with good technology decimated the other, proprietary tape formats for those servers, including DLT, Super-AIT, VXA and many others.
One after the other they all folded, leaving LTO supreme. Quantum joined the LTO by buying Certance, Seagate's ex-tape arm, in 2005. Nowadays only IBM and HP make drives, with Quantum selling LTO drives but not making them.
IBM, Oracle, HP and Quantum are three of the main tape library vendors, with SpectraLogic being a fourth. Talking of a tape resurgence, Spectra sold 550PB of tape storage in the second half of 2012.
Tape is a sequential access medium, but a virtual file:folder access system for it – called LTFS (Linear Tape File System) – has been created by IBM. This provides a drag-and-drop Windows file:folder-like access method to reading and writing files on tape.
It means ordinary users can, in theory, write files to tape and read files on tape without going through a backup software package, each of which has its own interface. This promises to revolutionise how tape is used, democratising, so to speak, its access.
That's the current tape technology state of play. But what's coming?
Each of the three main formats above has a roadmap out to one or two future generations which generally increase both capacity and data transfer speed. For example, LTO suppliers are currently shipping LTO-6, the latest generation. Sitting in the wings are LTO-7 and -8.
These increase both capacity and speed. LTO-7 provides 6.4TB of raw capacity (16TB compressed at a 2.5:1 ratio) and a 315MB/sec raw data transfer speed, which compares to LTO-6's 210MB/sec. LTO-8 should provide 12.8TB of capacity and a 427MB/sec transfer speed, both for raw data.
We might expect a new generation to become available 30 months or so after the preceding one. That's generally how it works. Historically each LTO generation drive can read and write the preceding generation's drive and read tapes from the generation before that, thus easing migration to the latest format.
We predict future LTO generations like LTO-9 and LTO-10 which double transfer speed and capacity compared to the preceding generation, not that the LTO consortium is committing to these format futures.
Oracle has a similar roadmap for its T10000 format. A coming T10000e could offer 12-20TB capacity and transfer speeds in the range of 400 - 600MB/sec, though we might think 300 - 350MB/sec might be more realistic. We're sure Oracle is more precise in its numbers when talking to its tape customers.
IBM? The same pattern we're sure, although Big Blue is coy about publicising it. El Reg storage desk also predicts that TS1150 and TS1170 formats could follow on from the existing TS1140 format, again with a general doubling of both capacity and transfer speed.
IBM has demonstrated a 35TB raw capacity tape. It is now preparing a demonstration of a tape holding 125TB using a refinement of today's barium ferrite tape media, not a totally new recording technology. Assume that works, then a TS3500 library using such tapes could hold 84 exabytes of data. It's a fantastically humungous amount of data. Nothing else will come close in terms of the cost/GB of storage, nothing.
Assume LTO keeps on doubling its format every 30 months or so and we could see a 102TB capacity LTO-11 in 12 to 13 years, 2025-2026.
How can we be sure this is reasonable and not a technological pipe dream? It's because the physical size of a bit on tape is, relative to a disk bit, gigantic. There is simply lots of room to shrink the physical size of tape bits without prejudicing bit stability, as is happening with current PMR disk recording technology which is forcing a move to a new recording technology. The 125TB tape project involves shrinking the tape bit size or areal density to 100Gbit/in2 which compares to the advanced disk areal densities in use now in the 620-690Gbit/in2 area.
DAT's future is a fading to insignificance as disk and cloud backup generally replace it.
Tape is the archive medium and it is being used now by some of the biggest archivers of data around such as Google and Amazon.
Tape and the cloud
Six or so Oracle StreamLine 8500 tape libraries in Google's Lenoir data centre.
Also, the Googleplex recovered from an email outage in 2011 using tape. If tape is used by the famously cost-efficient and reliable Google IT department, then it's harder to think of a more ringing endorsement. Perhaps if Amazon endorsed tape? … it has, though not publicly.
The Glacier cloud archiving service started up last year by Amazon is based on SpectraLogic T-finity tape libraries. How many?
Amazon says that the Glacier service stores data in multiple facilities and on multiple devices within each facility. The Glacier service is available in just a few Amazon regions:
- US East - North Virginia
- US West - Oregon
- US West - Northern California
- EU - Ireland
- Asia Pacific - Tokyo
The number of data centres in each region varies with the number of availability zones in each region and is not generally publicised by Amazon. However, a bit of Googling shows five availability zones and 10 data centres in the US East (North Virginia) region. US West - Oregon has at least three data centres. To get a rough approximation let's assume two libraries per data centre and four data centres per region, which would indicate Amazon has something like 40 SpectraLogic libraries.
SpectraLogic has just announced sparkling results.
Two of the biggest cloud service providers in the world are using tape for archiving because its virtues in terms of scale, reliability, performance and cost-effectiveness are evident enough to them for the two to commit to buying significant numbers of libraries. What better customer references do you need?
The state of tape play
It is almost possible to say that tape's role and future in data centres with a significant data archiving need is secure. We would need more examples like Google's and Amazon's use of tape before saying that.
(Facebook has a similar archiving need to Amazon and Google, but appears to be pursuing the idea of using solid state storage for its photo archive – prioritising archive access speed. It announced this at the Open Compute Summit in January.)
Tape in a library has an inherent access latency as the cartridges have to be picked by a robot, transferred to a drive, the drive started, and then the tape streamed to the right section of the ribbon. This will never be avoided unless, like disks, tape cartridges each have their own drive, a most unlikely occurrence and one that would drive up tape cost significantly.
The most significant long-term threat to tape archiving could be cheap flash, the 3-bits per cell triple layer cell (TLC) NAND. However, its future will be affected by NAND scaling issues (untested 3D NAND chip stacking is one way of boosting storage density when NAND can't be scaled down anymore) and uncertainty about NAND replacement technologies such as Phase-Change Memory or Resistive RAM.
For decades a soldier's main weapon has been a device to fire a lethal projectile across a distance to the enemy. It looks likely to remain so as alternatives to the gun are simply not feasible and gun technology keeps on getting refined. Tape plays a similarly long-lived role in the data protection arsenal and its future looks just as assured.
Tape offers an escape from the excessive cost dead-end of deduplicating disk archives. ®
* A Clipper Group report (PDF) looked at the total cost ownership of disk-based and taped-based systems and concluded tape delivers a significant TCO advantage over disk.