Feeds

BOFH: The Great Patch Mismatch

Halon, the noblest of gases

  • alert
  • submit to reddit

Build a business case: developing custom apps

Episode 13

"It's just a minor ROM patch." the service engineer bleats "It'll only take five minutes."

"Yeah... Nah," the PFY says.

"It's minor - just addresses a couple of memory leaks and and cookie issues in the web interface."

"Yeah. Nah," I repeat.

"It's just the interface - the UPS will be completely unaffected!"

"Nope," the PFY says.

"My assistant and I subscribe to the belief that if it ain't broke it don't need fixing," I add.

"But it's a mandatory fix!"

"We've got a mandatory change freeze on server room gear."

"Till when?"

"Hang on, I'll just check the thermometer..."

"What for?"

"To see if Hell's frozen over..."

"But if you don't do the upgrade you won't get the advanced diagnostics!"

"Which would uncover hitherto undetectable problems?"

"Uh... No..."

"Provide us with enhanced predictive failure?"

"No."

"Report error conditions in a different manner?"

"Uhhh.. YES!"

"Yes?"

"Yes, instead of the light going orange when there's an error, it now goes red. And the code on the diagnostic display changes from 000 to :-["

"Well that seems to be worthwhile having!" the Boss says, having slipped into Mission Control unnoticed.

"No it's not!" I chip in.

"It is," the service droid says. "And when each time your reboot and the self test passes the status changes to :-]"

"We should get that," the Boss says, with not even a hint of sarcasm!

"Nah, we shouldn't," the PFY says.

"It's completely transparent, and it can all be done online!" looking to the Boss for papal intercession.

...

"It's one of those shiny bead situations," I say angrily to the PFY a few minutes later when the Boss has vetoed our veto and told the engineer to go ahead. "Like those RAM fail front panel LEDs."

"?" the PFY asks

"RAM Fail LEDS, on the front panel instead of on the motherboard. Your RAM craps out and the corresponding LED lights up on the front panel. Everyone loved them until they found out you had to disconnect the front panel to open the lid. The lamps would go out and then you'd realise that the front panel LEDs were numbered 1 to 8 whilst the RAM modules were labelled A1 to A4 and B1 to B4, so then you'd have to guess whether faulty module 5 was module B1 or A3. But they bloody looked cool when your server fell over - and that's the important thing."

"Well if the UPS upgrade is transparent then what diff..."

>beep beep<

"What's that?" the Boss asks, obviously hovering outside Mission Control with conscience pangs.

"That's Nagios reporting that the UPS interface is offline."

"So it's down?"

"The interface, yes. But not UPS1. If it was the UPS1, my desk would be down and the mains fail buzzer on the PFY's desk would sound. Unless UPS2 went down too, and then there would just be silence."

. . .

"I... need to restart the interface," the engineer says, leaning around the doorway to Mission Control about a few minutes of tapping and prodding later.

"Because you mixed up the terms transparency and opacity?" I ask.

"It'll only take a minute."

"I think we'll schedule that for another time," I say.

"But I've already put the service key in."

"Then take it out."

"If I take it out the interface will restart!"

"Then leave it in."

"If I leave it in the UPS will stay in Bypass mode."

"YOU PUT IT IN BYPASS MODE!!" the PFY snaps.

"Shouldn't it warn you that it's in bypass?" the Boss asks.

"NOT IF THE BLOODY INTERFACE ISN'T WORKING!" I seethe.

"Well we need to get it back online don't we?" the engineer hints.

"No, we'll use the external mechanical bypass and then restart the UPS," I say.

"It's in internal battery bypass," the engineer explains. "It's still converting to DC and back out to AC, so if you put it in external bypass without telling it what you're doing it could short out the inverters."

"Then tell it what you're doing!" the Boss says, predictably.

"I can't - the interface is down!"

"So if you restart the interface, will it shut down the UPS?" the Boss asks.

"No, the interface and the UPS aren't connected like that. It's just a serial connection."

... 10 minutes later...

>click! Whhrrrrrrr..< >buzz buzz< >buzz buzz< >buzz buzz<...

"Not connected like that," I say to the Boss as my desk descends into dark silence.

"So the servers are down?"

"No, 98 per cent of them will still be up because we have dual UPS units."

"Don't worry!" the engineer says. "It's coming up now. Be back in a jiffy!"

. . .

"Mmm?" the PFY says as the engineer pops his head back into Mission Control cautiously.

"It's detected a ROM mismatch with the other UPS."

"How?" the Boss asks.

"They're connected. So the updated UPS has seen the other UPS is back-rev and won't complete start-up until the other one is up to date. We need to update the other UPS."

"And have the whole room power down? I don't THINK so!" the PFY snaps.

"No, all we need to do is update the firmware. It doesn't have to boot from the new firmware, it just has to have it on board."

"Can't we just disconnect the cable so that the UPS thinks it's standalone?" the PFY suggests.

"No, it's in a cluster."

"Can't we uncluster it?"

"Sure, we can do that from the interface once it's booted."

"So... we're going to have to update the other UPS - but NOT reboot it?"

"Yes, it's just a patch, not a restart."

... Five minutes later ...

"And this," the PFY says to the Boss, "is what silence is like. Notice the absence of phone calls - because the phone system hangs of the UPS units."

"Notice," I say, "the sudden lack of fresh air - because the aircon system has noted a power anomaly and shut down all the large chiller and fresh-air-fan motors."

"Notice," the PFY says, "the sound of sirens in the distance - because parts of our fire system which SHOULD have been replaced use normally open relays to detect a fire condition."

"Notice," I say, "the complete lack of sound from the server room. Not a single whirr, no buzzes, no clicks. Not even the sound of an engineer trying to explain something - because the halon system*'s connected to the crappy old relays."

"Wasn't it all brand new sensors in there?"

"Who can tell? Perhaps you can go and check?"

"Uh.. no, I'm sure you're right," the Boss says nervously.

"No seriously, go and check!"

"No, it's fine - I'm sure I can see perfectly through the viewing window."

"Ah good, so at least we salvaged something from today!" the PFY comments.

"?"

"You know what transparent means now..." ®

* A fire suppression system that works with halon gas.

Boost IT visibility and business value

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.