Feeds

BOFH: The Great Patch Mismatch

Halon, the noblest of gases

  • alert
  • submit to reddit

High performance access to file storage

Episode 13

"It's just a minor ROM patch." the service engineer bleats "It'll only take five minutes."

"Yeah... Nah," the PFY says.

"It's minor - just addresses a couple of memory leaks and and cookie issues in the web interface."

"Yeah. Nah," I repeat.

"It's just the interface - the UPS will be completely unaffected!"

"Nope," the PFY says.

"My assistant and I subscribe to the belief that if it ain't broke it don't need fixing," I add.

"But it's a mandatory fix!"

"We've got a mandatory change freeze on server room gear."

"Till when?"

"Hang on, I'll just check the thermometer..."

"What for?"

"To see if Hell's frozen over..."

"But if you don't do the upgrade you won't get the advanced diagnostics!"

"Which would uncover hitherto undetectable problems?"

"Uh... No..."

"Provide us with enhanced predictive failure?"

"No."

"Report error conditions in a different manner?"

"Uhhh.. YES!"

"Yes?"

"Yes, instead of the light going orange when there's an error, it now goes red. And the code on the diagnostic display changes from 000 to :-["

"Well that seems to be worthwhile having!" the Boss says, having slipped into Mission Control unnoticed.

"No it's not!" I chip in.

"It is," the service droid says. "And when each time your reboot and the self test passes the status changes to :-]"

"We should get that," the Boss says, with not even a hint of sarcasm!

"Nah, we shouldn't," the PFY says.

"It's completely transparent, and it can all be done online!" looking to the Boss for papal intercession.

...

"It's one of those shiny bead situations," I say angrily to the PFY a few minutes later when the Boss has vetoed our veto and told the engineer to go ahead. "Like those RAM fail front panel LEDs."

"?" the PFY asks

"RAM Fail LEDS, on the front panel instead of on the motherboard. Your RAM craps out and the corresponding LED lights up on the front panel. Everyone loved them until they found out you had to disconnect the front panel to open the lid. The lamps would go out and then you'd realise that the front panel LEDs were numbered 1 to 8 whilst the RAM modules were labelled A1 to A4 and B1 to B4, so then you'd have to guess whether faulty module 5 was module B1 or A3. But they bloody looked cool when your server fell over - and that's the important thing."

"Well if the UPS upgrade is transparent then what diff..."

>beep beep<

"What's that?" the Boss asks, obviously hovering outside Mission Control with conscience pangs.

"That's Nagios reporting that the UPS interface is offline."

"So it's down?"

"The interface, yes. But not UPS1. If it was the UPS1, my desk would be down and the mains fail buzzer on the PFY's desk would sound. Unless UPS2 went down too, and then there would just be silence."

. . .

"I... need to restart the interface," the engineer says, leaning around the doorway to Mission Control about a few minutes of tapping and prodding later.

"Because you mixed up the terms transparency and opacity?" I ask.

"It'll only take a minute."

"I think we'll schedule that for another time," I say.

"But I've already put the service key in."

"Then take it out."

"If I take it out the interface will restart!"

"Then leave it in."

"If I leave it in the UPS will stay in Bypass mode."

"YOU PUT IT IN BYPASS MODE!!" the PFY snaps.

"Shouldn't it warn you that it's in bypass?" the Boss asks.

"NOT IF THE BLOODY INTERFACE ISN'T WORKING!" I seethe.

"Well we need to get it back online don't we?" the engineer hints.

"No, we'll use the external mechanical bypass and then restart the UPS," I say.

"It's in internal battery bypass," the engineer explains. "It's still converting to DC and back out to AC, so if you put it in external bypass without telling it what you're doing it could short out the inverters."

"Then tell it what you're doing!" the Boss says, predictably.

"I can't - the interface is down!"

"So if you restart the interface, will it shut down the UPS?" the Boss asks.

"No, the interface and the UPS aren't connected like that. It's just a serial connection."

... 10 minutes later...

>click! Whhrrrrrrr..< >buzz buzz< >buzz buzz< >buzz buzz<...

"Not connected like that," I say to the Boss as my desk descends into dark silence.

"So the servers are down?"

"No, 98 per cent of them will still be up because we have dual UPS units."

"Don't worry!" the engineer says. "It's coming up now. Be back in a jiffy!"

. . .

"Mmm?" the PFY says as the engineer pops his head back into Mission Control cautiously.

"It's detected a ROM mismatch with the other UPS."

"How?" the Boss asks.

"They're connected. So the updated UPS has seen the other UPS is back-rev and won't complete start-up until the other one is up to date. We need to update the other UPS."

"And have the whole room power down? I don't THINK so!" the PFY snaps.

"No, all we need to do is update the firmware. It doesn't have to boot from the new firmware, it just has to have it on board."

"Can't we just disconnect the cable so that the UPS thinks it's standalone?" the PFY suggests.

"No, it's in a cluster."

"Can't we uncluster it?"

"Sure, we can do that from the interface once it's booted."

"So... we're going to have to update the other UPS - but NOT reboot it?"

"Yes, it's just a patch, not a restart."

... Five minutes later ...

"And this," the PFY says to the Boss, "is what silence is like. Notice the absence of phone calls - because the phone system hangs of the UPS units."

"Notice," I say, "the sudden lack of fresh air - because the aircon system has noted a power anomaly and shut down all the large chiller and fresh-air-fan motors."

"Notice," the PFY says, "the sound of sirens in the distance - because parts of our fire system which SHOULD have been replaced use normally open relays to detect a fire condition."

"Notice," I say, "the complete lack of sound from the server room. Not a single whirr, no buzzes, no clicks. Not even the sound of an engineer trying to explain something - because the halon system*'s connected to the crappy old relays."

"Wasn't it all brand new sensors in there?"

"Who can tell? Perhaps you can go and check?"

"Uh.. no, I'm sure you're right," the Boss says nervously.

"No seriously, go and check!"

"No, it's fine - I'm sure I can see perfectly through the viewing window."

"Ah good, so at least we salvaged something from today!" the PFY comments.

"?"

"You know what transparent means now..." ®

* A fire suppression system that works with halon gas.

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.