Got it taped: The business of tape-based disaster recovery
Taking a risky - or risk free? - hike up Iron Mountain
The recovery position
Tape library machine costs aside, that works out at around £2,000 for all your backup storage over that period – under £3 a day and although the needs will increases, the expansion is slow and steady.
"It’s typically when a client has databases that grow larger that you start using more. They might spill over from one to two tapes or five to six tapes. Hence, you have more tapes in your cycle, so after a while your stock of re-usable incoming tapes gets smaller and smaller.”
LTO-6 Ultrium tapes have yet to catch on but offer 6.25TB compressed capacity and claim a 30 year archival life
This pack of 20 notches up 125TB and comes to around £10 per terabyte
Chuckling, he adds, “And what normally happens is somebody notices: we haven’t got many tapes for so and so – are there any at Iron Mountain that can come back, are there any in the cupboard? Eventually, you get to the point where you need to order some tapes. It is fairly rare and might only happen a few times a year.”
The shifting of both old and new tapes between the remote secure vault, the data centre and into the tape machines themselves involves some careful management and integration between the logging systems of High Fibre and Iron Mountain, which is where the labelling comes into play.
“I instigated a numbering system so I could identify the client just by looking at the number. We buy them blank and labelled with these numbers and barcodes. There’s an additional site sticker that goes on the top of the tape which identifies us amongst Iron Mountain’s clients. However, the tape library only reads the trioptic label. Any tape within our data centre is unique, so no two clients have the same numbered tapes. The backup admininstrators in India know which backup system the tapes are on and which client it belongs to. The tape library is obviously connected to that backup server – logically and physically – with copper or fibre.”
Buncefield: Source: MIIB, Chiltern Air Support Unit
So the clients can sleep easy in their beds with different backups: daily, weekly, monthly and even annually, with the longer periods kept for years. Lloyd recalls the sobering example of the Buncefield oil terminal explosion from 2005 which wiped out 92 neighbouring businesses and left 9,500 employees with no premises to work in. Among them was a data centre run by Northgate Information Solutions. Yet on the strength of its business continuity planning, it was able to retain its customers. It can be a very different outcome for those without a DR safeguard, as Lloyd points out.
"At Buncefield, you couldn’t get back to the premises for some weeks; in that time your business could go to the wall. With no DR strategy in place, a large number of companies who have a fire never get back into business again."
As mentioned earlier, the whole disaster recovery routine is tested at regular intervals but to be effective, procedures need to be followed and the idiosyncrasies of the various components involved need to be accounted for in meticulous detail. Hal has first hand experience of the sorts of issues that arise with live DR testing.
“You’ve got a physical replication of your SAN at the target site which is generally not in use. The server builds, the OS patching that’s all up-to-date and ready to go. All it’s waiting for is the data on the tapes, and, of course, the switching of the communication links from one site to another.
Some of the target systems have the same local IP addresses as the production systems, so when you’re pulling data off a tape, the host names and IP addresses are all the same. Also, various databases and applications have hard coded IP addresses in them, I wish they hadn’t but they do tend to do that in the SAP world.
You have to be very careful when you switch over to the DR environment. The users involved have to be sure that they are not actually looking at the original production data and that they are looking at the DR data. Furthermore, they need to keep in mind that any updates that they might make are simply going to be lost. During a test we can’t have them working on a DR system thinking that they are working on the main production systems. So it’s all very tightly controlled.”
A full suite of tests will go on for several hours and at a specified point in the project, there’s a switchover back to the production system. The production communications are re-established and off you go, the DR’s left again for the next test. Now this might seem like a lot of horsepower idling away for six months in between tests but in Hal’s experience, clients will use their DR equipment for other projects.
“We have one that uses it for dev, tests and training and so on. But if a DR occurred on their main production systems they would effectively sacrifice the training, test and development environment. The DR needs would take over but that’s all manageable as you might have extra storage to preserve any important projects being run on these systems.”
Certainly, development and training is a very practical approach to utilise such costly equipment that's being saved for a very rainy day, ready to be transformed into a fully functioning data centre by simply inserting a few tapes. Whether the end users know it or not, a vast range of enterprises still depend upon this extremely reliable and affordable media to resurrect their fortunes when disaster strikes. The continued development of LTO drives, with ever increasing capacities, suggests an enduring future, particularly as big data and the cloud reign over data centres whose business is to ensure that they have got it taped. ®