Divert the power to the shields. 'I'm givin' her all she's got, Captain!'
Always check the roof – you never know what might (not) be lurking there
Who, Me? August is now just a memory, but hey – console yourself that Christmas is just around the corner. Or simply grab a caffeinated Monday beverage and take delight in another's pain courtesy of The Register's regular Who, Me? column.
Today's tale comes from a reader we'll call "Jack", who spent the early part of this century working on a project to unify the land-based communications for some parts of the US military.
The project, Jack recalled, included networking, video, audio, phones and so on. "The system," he told us, "employed a mixture of Sun Solaris Unix servers and Microsoft Windows 2000 servers (and workstations)."
The data centre took up an entire floor of one of the larger buildings on the US base. "The non-classified server room was about the size of a football (US, not soccer) field, and a smaller classified server room. Help Desk and NOC (network operations control) were also located there."
Jack, who had been originally hired as a Windows administrator, got his first inkling that all might not be well with the project when he studied the reams of documentation supplied by engineers for the setting up of Microsoft's finest. It lacked TCP/IP settings, which, as anyone who has had the misfortune to sit in front of the install wizard of the operating system knows, are a tad important.
But hey ho, Jack was (and is) a professional and made do with what he had.
"After a year or so," he told us, "all the hardware/software was installed and configured; and about 50,000 end users were live on the system." The eventual plan was to have 450,000 users on the thing.
Then, one summer weekend (and it is always the weekend, isn't it?), the power to all that gear abruptly failed.
Now, there are three important things to consider here. Firstly, the DC was located in a US state notable for hot summer days and warm summer nights. Secondly, power was naturally backed up by a generator that was seated on the roof: "It was quite large, having powered all the previous mainframes and terminals."
And thirdly: "It was strongly discouraged [for us to] actually go to the roof of the building to SEE the generator system."
As seasoned readers have probably already guessed: "It was NOWHERE NEAR large enough to run all of the servers, NOC and HD workstations, and other power needs."
While the battery-backed UPS units shrieked in their death throes, a hurried shutdown ensued. Being a government operation, "no one had ever thought of setting up a Shutdown Order Document (and the companion Start Up Document)," said Jack, before acidly observing: "Those documents became a priority after all of this."
Jack called his boss at home to agree which servers absolutely had to stay up. With the batteries under the floor wheezing their last in less than 20 minutes, tough decisions had already been taken by the team.
Eventually the gang got things powered down to the point where the asthmatic generator could handle the load. A lucky few got delegated to dealing with irate users calling in (those that could – a large chunk of the phone system had fallen victim to desperate power-saving measures) and, said Jack, "all was well in server land."
Alas, all was most definitely NOT well with the chillers responsible for keeping the server room cool. "For some unknown reason, NONE of the chillers were even set up to be powered by the generator," said Jack.
"The server room started warming up from the usual meat-locker temperature. Felt kind of good for a bit, except [for] the annoying little bell in our heads that it was going to get too warm. And – it did."
Even with the huge reduction in powered-on equipment, the temperature still climbed into the 90s (Fahrenheit – which is the 30-somethings in Celsius) until, after a day or so (which must have seemed much longer), power was finally restored.
The bonus day turned out to be handy as it gave the team time to devise a plan for bring all those abruptly terminated Windows and Unix servers back to life. Sunday (and into the night) was spent nervously pressing the "On" button and watching services spin up.
"Upper Management," said master-of-understatement Jack, "realized that [a] much larger generator and considerable rewiring was needed."
Even then, it still took a year to get the thing set up. "We were," said Jack, "quite fortunate not to have a repetition of the 'Event'."
While Jack, now semi-retired, has long departed the base, the moral of the story is one that could apply to oh-so-many projects today, government or private: "Poor planning never ends well."
Ever seen the best (or worst) made plans crumble at their first encounter with reality? Maybe some of those plans were yours? Drop Who, Me? a line and confess all to The Register readership. Discretion assured. ®
Sponsored: What next after Netezza?