Fire alarm sparked data centre meltdown emergency
All fun and games till the shutters come down
This Damn War Fire alarm tests are a good idea; you generally want the warm feeling that when something decides to combust, you'll be able to tell people about it with a loud ringing or wailing noise.
I used to run what you might consider a traditional machine room. We had a pile of ageing Sun kit – socking big CPU units and cabinets full of disk arrays. All of the expensive stuff was really sensitive to environmental anomalies and would happily cook itself in under an hour if the air-con failed. These were the days before journaled filesystems, and if something was shut down ungracefully it would be half an hour of disk checks before it finished rebooting.
Now, our fire alarm had some cunning add-ons. We had a very powerful air-con system, with extra protection for the comms room to prevent a fire spreading in there through the air-con ducts: when the alarm was activated, metal shutters would drop loudly over the grilles. And because the kit would obviously overheat in the absence of air-con, the system was also designed to kill the power to the comms room.
Of course, the alarm panel had a keylock on the front, with two positions: “Normal” and “Test”. And if the alarm was activated in the Test position, it would make a noise but wouldn't drop the shutters or kill the power.
One afternoon I was bashing away in a Telnet session and the fire alarm went off as scheduled. Oddly, the characters I was typing suddenly stopped appearing in the terminal window. So I fired up window to a different machine, and that wasn't responding either.
Thinking something was a bit weird, I trotted off downstairs to see what was going on. Our head of maintenance was standing outside the plant room (which was next door to my server cave) with a bemused expression. It didn't take a genius to realise that we'd had a full shutdown rather than a test one.
Except the maintenance manager showed me that the key was definitely in the Test position. Very puzzling. So I turned off the power switches on all my kit, leaving just a desk lamp plugged in, and had another bash at the alarm test. Sure enough, a big rattly noise scared the crap out of us (those galvanised steel shutters were heavy, and we'd forgotten that they fell with a heck of a clatter) and off went the power.
Out came the schematics of the alarm panel, and we traced it through. It had a bunch of electro-mechanical relays that controlled the various functions in the event of an alarm, and we figured that relay number 18 was the one that isolated the shutter/power circuit when the key was turned to Test – so there had to be something wrong with it.
The screwdriver was applied to the front panel of the alarm system, and when we pulled it off we saw the collection of 20-odd relays – little square doohickeys with transparent plastic covers. We tried to figure out which relay was which, as none of us had ever had the front panel off before.
As we discussed it, Stuart (our maintenance guru) tapped me on the shoulder and suggested that there was probably an easier answer. The giveaway was the relay whose cover had turned in to a melted black gooey mess. We agreed that this was probably the faulty one. We pulled it off and sure enough, the number 18 was on the circuit board underneath.
A week or so later the relay had been replaced. We figured it would probably be a reasonable idea to do a controlled shutdown and test the alarm with just a desk lamp again. So we switched to Test, pressed the button, and waited for the clatter. The alarm wailed, the light stayed on, and the shutters stayed open.
Happy days. All was well once more in the world of fire alarms. ®