Millennium Buggery: When things that shouldn't be shut down, shut down
Happy New Year from the Who, Me? Vultures
Who, Me? Happy New Year! Because you'll all be feeling delicate, El Reg thought we’d ease your pain with some of our other readers' more technical errors – distraction is the best cure for embarrassment, we're sure.
Both are from the time of the approaching Millennium Bug, and as reader "Frank" puts it, "the imminent EOTWAWKI".
At the time, Frank was managing the network of a startup wireless ISP.
"We had a network of about a dozen cells sites across the Thames Valley, supporting a hundred-or-so customers," he said.
"Each cell site was driven by an ACC 'Amazon' (aka Newbridge Networks) router, all interconnected using E1 microwave links and using the OSPF routing protocol to keep everything glued together.
"Customer static routes would be added to these routers and redistributed via OSPF."
The day came when Frank had to put the first customer static route on the central hub router, which was a multi-slot Tigris router that linked to all of the spoke sites, and also to the internet.
"We were a startup and we built it on the cheap," he said. "You won't find a better definition of Single Point Of Failure."
No matter what Frank did, he couldn't get the static route to stick.
"Eventually I threw the problem at Newbridge TAC*, and a tech called me back to explain that I couldn't add the first static route while the OSPF process was still running," Frank recalled.
"Hang on," said Frank, and, because he had the terminal window open at the time, he typed in the command to disable OSPF... on a router to which he was connected remotely.
"It didn't even give me the courtesy of echoing the CR to me," Frank told us.
"I can still hear the incredulity in the tech's voice, obviously sensing my predicament, when he replied 'You didn't just do that on a live network, did you?'"
The central site was a 20-minute drive away, so Frank grabbed his laptop and legged it, stopping briefly to warn customer services that the phones were about to starting ringing with people fearing their world was, indeed, coming to an abrupt halt.
Our second story of Millennium Buggery is from "Jim", who was asked to stay overnight, to protect a call centre from the dangers of the bug, as the company could ill afford for it to go down.
"I got sent a checklist of things that needed to be done in the server room, which boiled down to 'shut it all down'," he said. "So I did."
Almost immediately, the phones started ringing off the hook, with calls from the UK, from whence the firm was controlled; the Netherlands, where the networks were controlled; and the US, which was the parent body.
"We've lost connection to you, what's happening?" came the screams down the lines.
Jim told them he'd followed orders and had shut the server room down – but was informed in no uncertain terms that they didn't mean for him to shut down all the routers.
"I double-checked the CC list on the email, and asked them to show me where it said that I was not to shut down the routers," Jim said.
"There was no response because there was no exception for the routers in the instructions."
Flummoxed by the situation, the end of the line remained quiet, as the callers no doubt silently screamed at our literal Jim.
When the call centre was down, Jim told us, it cost the company about £50,000 an hour. "I suppose that was the reason for all the unhappiness."
We hope your new year's celebrations didn't end with such costly or stressful mistakes. But if they did, don't forget, you can tell Who, Me? here. ®
* Technical Assistance Centre