Plane or train? Tape or disk? Reg readers speak
Disk speed versus tape economy and removabiity
You the expert Plane or train? We asked four Reg-readers with storage smarts to say where and when we should use disk-based data protection and where we should cross the line and use tape. Three did just that. The fourth identified a fourth use-case for tape and added a salutary reminder that it has to be managed; it is absolutely not a start-backup-run-and-forget option. The consensus was that neither disk nor tape on their own are sufficient.
Disk is in because its faster to backup to disk and restore from it, but tape is not out, not at all IT has substantial cost advantages, holding much more data for less money, and it can be stored off-line, even off-premise, making it a better insurance against disaster striking a data centre. In general disk has not replaced tape, and probably won't.
H Wertz - Freelance system administrator
I think the availability of disk-to-disk backup and virtual tape library (VTL) systems has reduced the need for tape, but tape still has an important role for archival purposes.
There are uses where a disk-to-disk backup or a VTL excels. One of these is in cases where frequent retrievals are expected, such as users frequently deleting or overwriting files. (On a side note here, VMS had/has a versioning file system, which would inherently keep older copies of files available for easy retrieval, and just remove the oldest ones as the disk filled up. But this is a very unusual feature. I have not heard of another system that has this.) When I was a student in the late 90s, it'd take me about 30 minutes to pull a file off the departmental DDS-3, and it would take closer to 4 hours (and a hefty fee) for ITS to pull a file off the backup of main university systems.
Retrieval from a VTL would have taken a minute or two tops to find and retrieve a file. The big disadvantages of VTL? Since it is really an array of disks, software faults, hardware faults, or administrative faults could all render the library useless. In addition, the VTL would ordinarily be on-site.
Tape is still quite important for archival purposes, both for compliance and especially for disaster recovery purposes. A tape is written, then packed away and stored, so, unless tapes are re-used, those tapes provide an immutable record of what is on the system up to that point. Once it's ejected it won't be accidentally overwritten, erased, or modified. In addition, the tapes can be stored off-site, so in case of disaster the tapes won't be destroyed as the VTL could be. There are a few disadvantages, primarily "bit rot", the obvious retrieval speed disadvantage, and the cost of moving and storing tapes.
There are several technologies that could reduce the role of tape. First, internet backup allows for off-site storage without having to physically transport anything. However, like VTL it could allow for backups to be modified or deleted. Additionally, if you're using a service provider (instead of your own second site), it'd be a very good idea to verify if they have a robust set-up. One or two in the last 10 years have had a single hardware failure knock them off the face of the earth. Bandwidth is also a big issue: how long will it take to restore the entire system?
Second, systems that use removable hard disks. These provide the advantages of tape (disk can't be accidentally modified if it's not plugged in, can be stored off-site, and so on. In addition, it speeds up file retrieval; someone still has to insert the right disk, but then retrieval is nearly instant.
The disadvantages? Potentially reliability (although I've had great luck hooking up disks that have sat doing nothing for years). The big one? Price; tapes cost about 1/10th the cost per byte of hard disks.
Mainframes are a special case. In terms of disaster recovery, they have supported synchronisation of services and storage between multiple sites for over 10 years, so in case of a disaster a backup system can be ready to go. Also, mainframes have extensive journaling, so in case of a problem the system can be rolled back to an earlier state. However, if the goal is to eliminate the potentially thousands of tapes, a VTL is also required, as mainframe software and procedures assume at least one tape drive.
In conclusion, tapes may not have the high profile they used to, but they are still important for archival and disaster recovery purposes.
Henry Wertz graduated from the University of Iowa in 2000. He has been a Linux user since 1994 (loading off floppies makes a CD install seem like luxury!), and into cars (both fast ones, and ones that are highly efficient). He is currently doing freelance computer work. He is also a regular commenter on Reg stories.
Evan Unrue - Product Specialist at Magirus UK
What is the role of tape in IT departments by and large today? There has been a big drive from a number of storage vendors pushing disk-based backup media as the cure to all ills backup-related, especially with the advent of technologies such as deduplication. However, tape-based technologies still remain a staple diet of most businesses from the SMB (small and medium business) through to the enterprise.
I don't often see disk-based backup necessarily as the nemesis of traditional tape-based backup; the two (at least for today) appear more complimentary. In my experience most organisations today have a healthy balance of both tape and disk technology to accommodate their backup needs. In most cases organisations will run initial backup jobs to disk for incremental/differential backups and then take secondary copies of these backup jobs off to tape for longer term retention (monthly/annual backup runs).
There are a number of reasons for this; firstly the perception of disk being a faster media and lending itself to keeping backup jobs within defined backup windows. Secondly, most restore requests tend to be for data which has been backed up within the previous week/month in most cases (Last known non-corrupt point in time, recent accidental deletion of data, restoring data to recover from a recent server failure).
Storing daily/weekly backup jobs to disk facilitates a less painful exercise when restoring data. Tape does have its challenges compared to disk; tapes aren't always reliable, restores can be nail-biting moments and performance is typically slower in many cases than that of disk.
However tape definitely has its place. Portability is the key elements to tape's continuing success, especially for those companies which don’t have a second site to replicate disk-based data to. Having the ability to eject a tape and store it at a secondary site, a tape management outsourcing company, or even at home has a big appeal.
Tape also makes longer term retention less painful to the wallet. Accommodating longer term retention of backups on disk can be costly from a CAPEX perspective, but also, disks keep spinning, so doing this comes with a larger physical footprint in the datacenter and a larger power bill. Tape scales by adding cartridges which don’t spin when not being use and don’t take up space in the IT room as they scale (albeit the tapes need to be stored somewhere).
Disk-based deduplication technologies alleviate some of the cost implications of longer term retention on disk however, but these also come with a cost. The question is, how much is it costing you to track, store and restore your tapes financially and in man hours vs the cost of a reduplicated, disk-based backup solution? Also, it is worth bearing in mind that some companies (or associated regulatory bodies) mandate that a tape must be vaulted offsite for compliance purposes; in which case, there is no avoiding tape.
Bearing in mind that backup environments tend to be quite sticky and troublesome to rip and replace, a lot of organisations are less willing to refresh their backup infrastructure unless they have a compelling event. Some disk-based backup technologies dictate that their own front-end backup software must be used, which is less than desirable unless you are already wanting to move away from your existing backup software.
VTL functionality on many disk-based appliances however gives you all the benefits of disk, whole masquerading as a tape library to the backup software, causing minimal disruption. This is all well and good, but you still have to manage virtual cartridges (which can’t be written to and read in parallel). So, by emulating tape, you can find yourself constrained by some of the limitations of how tape fundamentally works.
In conclusion, I don’t think it’s a case of disk versus tape, but in most cases finding a balance between the two. Certain new technologies such as disk-based deduplication may prompt a revisit of the above scenario, but its always worth going through the process of determining the management/CAPEX/OPEX costs of your current set up versus the newer technologies. Newer disk-based technology may have a higher CAPEX impact, but will it save you time and money in the long run ?
Chris Evans - Independent storage consultant
Many people feel tape no longer has a place in the enterprise data centre. This is not a belief I subscribe to; tape still has a place as part of an overall data management strategy.
A comprehensive data protection strategy will address the following scenarios:
• Data corruption
• Loss of data
• Loss of access to data
Depending on RTO (recovery time objectives) and RPO (recovery point objectives), it may be more appropriate to use disk solutions for short-term backups. Using disk will generally be quicker than tape but more expensive. The decision on whether disk or tape is best comes down to the analysis of RTO/RPO requirements.
RTO is a measure of the acceptable elapsed time taken to recover data. It will vary based on the importance of the data being recovered – mission critical data will be recovered well ahead of development data in a disaster, for example. As data ages, the RTO typically also increases and so one strategy is to use disk for short-term backup/restore requirements, moving older backups to tape over time (otherwise known as Disk to Disk to Tape - D2D2T).
As a storage medium, tape still has many advantages:
• It is relatively cheap compared to keeping disk arrays spinning and available.
• It is compact.
• It is portable.
Of course people point to some of these strengths as weaknesses too; tapes can be lost and many companies have received fines for breaches of regulations after tapes have gone missing. However tape content can be encrypted, mitigating this risk. For long-term backup, tape has clear advantages in cost.
Compared to tape, disk solutions have the advantage of additional functionality, such as de-duplication. In a VTL solution, for example, backups are compressed by removing duplicate copies of data, retaining a reference to a single physical copy on disk. Although de-duplication can provide reductions in storage capacity (and therefore cost), it introduces a greater risk of data loss if a hardware failure occurs in the backup system. It also introduces potential performance issues when multiple restores are performed from the VTL at the same time.
One area where tape is typically mis-used is in the retention of old backups to create a data archive. Unfortunately in many cases, this isn’t the best approach because:
• Tapes can become lost or damaged or lose their contents over time. An archive can’t afford to lose data. Many tape users simply retain multiple backups in the hope that this covers all data ever created. This is unlikely to be the case.
• Tape data is usually stored in the format of the backup product and so not easily searchable; at most the backup software will retain a list of files on tape but not metadata relating to the content.
• Tape's content isn’t easy to refresh. Data has to be physically copied out of the backup software to a format that can be backed up again by other backup software. A data archive therefore needs proper content management processes in place.
Generally, the more complex the data, the more unsuitable it is for long term tape archive.
Another scenario gaining in popularity is the idea of using a cloud storage service to provide backup facilities. At present the use cases for cloud-based backup are small because the time taken to instigate a restore from the cloud will be longer than is acceptable under recovery time objectives. I see the use of cloud backup as being more useful for home users and SMB but not the enterprise.
In summary, tape still has a role to play in the data centre. It is one of many tools that can be used in deploying a comprehensive data management strategy. Tape retains advantages in cost, but has issues around effective management. With companies such as Google having to rely on tape for data recovery, we can be sure that tape has a future in the data centre for many years to come.
Chris M Evans is a founding director of Langton Blue Ltd.. He has over 22 years' experience in IT, mostly as an independent consultant to large organisations. Chris's blogged musings on storage and virtualisation can be found at www.thestoragearchitect.com.
Oculus - IT Specialist
If you use tape it has to me managed; simple, no? Oculus writes about a situation where tape was used to back up sensitive data but not managed at all.
A project involved sensitive data. The sensitivity required that the data should be held on a server located within the project office, under suitable access controls and physical protection (which included the requirement that the server should be powered down and the disks moved into a safe at the end of the working day). Initially, the project was small but rapidly growing; the data was held on four hard disks arranged as two RAID-1 pairs (the disks are mirrored in pairs so that no data is lost in the event of any single disk failure).
Then the project became much larger, and the server was moved into its own “cage”. It had also been expanded, and a daily backup-to-tape instituted (the tapes were held in secure storage, and a rotation scheme was used so that at any time there were several daily and several weekly backups held on different tapes). As the backup took several hours to complete, it was run overnight (the “cage” affording sufficient physical security to allow the server to be run continuously), and the task of starting the backup run was assigned to any of several junior staff.
The backup routine persisted for several months without incident. But one morning someone took a closer-than-usual look at the console in the morning, and found that the backup run had finished with an error condition: it had run out of tape. Checking the other tapes in the collection, the team found that every backup in the collection was incomplete: they had all run out of tape, and nobody had noticed. For at least a few weeks, and probably for several months, there had been no serviceable backups. Fortunately, there had never been occasion to attempt to restore data from the backups.
Lessons learned: Firstly, ensure that you are capable of restoring from your backups. Secondly, ensure that you check for error messages! Thirdly, make sure that any junior staff assigned to routine tasks are able to respond appropriately (e.g. by getting a more senior colleague to take a look) if anything unexpected happens.
Oculus is the pseudonym of someone who is not a specialist in storage; he is a specialist in another field, who has been around long enough to see how storage works – and how it fails. A feature of the project mentioned is that there was no option to use an external data centre, because the project used data which was too sensitive to trust to an outside agency. It was not appropriate to name names, projects or the firm.
The consensus is that tape drives will predominate in archive-focused data protection applications, with disk drives being preferred for backup apps; no surprises really, unless you are of the disk-can-do-it-all persuasion. From the archive point of view tape-based storage is much more cost-efficient, removable for off-site data protection and fast enough for the purpose.
Of course it would be good if it were faster still and, thankfully, newer tape formats transfer data faster than old ones, and coming tape formats, such as LTO7, will transfer data faster still. If the disk-based backup array vendors manage to get the cost of a disk archive down to the tape area then that would erode tape's popularity as an archive medium but it would still have its removability advantage. ®