Have a Plan A, and Plan B – just don't go down with the ship
Paranoia and obsessiveness will keep you afloat
Test and test again
The chap in question wouldn’t be offended if I said he was one of those stereotypically poor documenters, yet his Oracle disaster recovery plan was meticulous and surgically precise – in addition to being a welcome relief in a time of crisis – so why was that single piece of Oracle documentation so good?
Because, it was rigorously tested, and the same should be true of your disaster recovery system (and of course, its associated documentation).
The old adage “you’re only as good as your last backup” is a popular one, but “you’re only as good as your last restore” is even more pertinent. From a single backup to an entire fallback system, if you haven’t tested it with real data and active use you have no way to prove it will actually come to your rescue in your time of need.
It can be difficult to initiate and perform full disaster recovery test – where the entire live system is failed over and back again to prove the system works – and getting business buy-in for something so disruptive can be next to impossible.
If you can sell the benefits and run a full live test of a disaster recovery day (or weekend, or week) then you will know for certain that your DR system – and crucially, your plan – works in the real world. If you’re unable to swing a full test, then at least testing one system or department at a time over an extended period of time is the next best thing.
Remember to treat any disaster recovery trial as a test of the documentation and the plan as much as the system, and try to have someone who didn’t write the plan perform the test; this is the perfect opportunity to test out your deputised sysadmins, and ensure the process works no matter who is at the helm on the day.
You, your team, and the business stakeholders will all sleep better at night as a result. ®