The Register® — Biting the hand that feeds IT

Feeds

BOFH: The Great Patch Mismatch

Halon, the noblest of gases

Free ESG report : Seamless data management with Avere FXT

Episode 13

"It's just a minor ROM patch." the service engineer bleats "It'll only take five minutes."

"Yeah... Nah," the PFY says.

"It's minor - just addresses a couple of memory leaks and and cookie issues in the web interface."

"Yeah. Nah," I repeat.

"It's just the interface - the UPS will be completely unaffected!"

"Nope," the PFY says.

"My assistant and I subscribe to the belief that if it ain't broke it don't need fixing," I add.

"But it's a mandatory fix!"

"We've got a mandatory change freeze on server room gear."

"Till when?"

"Hang on, I'll just check the thermometer..."

"What for?"

"To see if Hell's frozen over..."

"But if you don't do the upgrade you won't get the advanced diagnostics!"

"Which would uncover hitherto undetectable problems?"

"Uh... No..."

"Provide us with enhanced predictive failure?"

"No."

"Report error conditions in a different manner?"

"Uhhh.. YES!"

"Yes?"

"Yes, instead of the light going orange when there's an error, it now goes red. And the code on the diagnostic display changes from 000 to :-["

"Well that seems to be worthwhile having!" the Boss says, having slipped into Mission Control unnoticed.

"No it's not!" I chip in.

"It is," the service droid says. "And when each time your reboot and the self test passes the status changes to :-]"

"We should get that," the Boss says, with not even a hint of sarcasm!

"Nah, we shouldn't," the PFY says.

"It's completely transparent, and it can all be done online!" looking to the Boss for papal intercession.

...

"It's one of those shiny bead situations," I say angrily to the PFY a few minutes later when the Boss has vetoed our veto and told the engineer to go ahead. "Like those RAM fail front panel LEDs."

"?" the PFY asks

"RAM Fail LEDS, on the front panel instead of on the motherboard. Your RAM craps out and the corresponding LED lights up on the front panel. Everyone loved them until they found out you had to disconnect the front panel to open the lid. The lamps would go out and then you'd realise that the front panel LEDs were numbered 1 to 8 whilst the RAM modules were labelled A1 to A4 and B1 to B4, so then you'd have to guess whether faulty module 5 was module B1 or A3. But they bloody looked cool when your server fell over - and that's the important thing."

"Well if the UPS upgrade is transparent then what diff..."

>beep beep<

"What's that?" the Boss asks, obviously hovering outside Mission Control with conscience pangs.

"That's Nagios reporting that the UPS interface is offline."

"So it's down?"

"The interface, yes. But not UPS1. If it was the UPS1, my desk would be down and the mains fail buzzer on the PFY's desk would sound. Unless UPS2 went down too, and then there would just be silence."

. . .

"I... need to restart the interface," the engineer says, leaning around the doorway to Mission Control about a few minutes of tapping and prodding later.

"Because you mixed up the terms transparency and opacity?" I ask.

"It'll only take a minute."

"I think we'll schedule that for another time," I say.

"But I've already put the service key in."

"Then take it out."

"If I take it out the interface will restart!"

"Then leave it in."

"If I leave it in the UPS will stay in Bypass mode."

"YOU PUT IT IN BYPASS MODE!!" the PFY snaps.

"Shouldn't it warn you that it's in bypass?" the Boss asks.

"NOT IF THE BLOODY INTERFACE ISN'T WORKING!" I seethe.

"Well we need to get it back online don't we?" the engineer hints.

"No, we'll use the external mechanical bypass and then restart the UPS," I say.

"It's in internal battery bypass," the engineer explains. "It's still converting to DC and back out to AC, so if you put it in external bypass without telling it what you're doing it could short out the inverters."

"Then tell it what you're doing!" the Boss says, predictably.

"I can't - the interface is down!"

"So if you restart the interface, will it shut down the UPS?" the Boss asks.

"No, the interface and the UPS aren't connected like that. It's just a serial connection."

... 10 minutes later...

>click! Whhrrrrrrr..< >buzz buzz< >buzz buzz< >buzz buzz<...

"Not connected like that," I say to the Boss as my desk descends into dark silence.

"So the servers are down?"

"No, 98 per cent of them will still be up because we have dual UPS units."

"Don't worry!" the engineer says. "It's coming up now. Be back in a jiffy!"

. . .

"Mmm?" the PFY says as the engineer pops his head back into Mission Control cautiously.

"It's detected a ROM mismatch with the other UPS."

"How?" the Boss asks.

"They're connected. So the updated UPS has seen the other UPS is back-rev and won't complete start-up until the other one is up to date. We need to update the other UPS."

"And have the whole room power down? I don't THINK so!" the PFY snaps.

"No, all we need to do is update the firmware. It doesn't have to boot from the new firmware, it just has to have it on board."

"Can't we just disconnect the cable so that the UPS thinks it's standalone?" the PFY suggests.

"No, it's in a cluster."

"Can't we uncluster it?"

"Sure, we can do that from the interface once it's booted."

"So... we're going to have to update the other UPS - but NOT reboot it?"

"Yes, it's just a patch, not a restart."

... Five minutes later ...

"And this," the PFY says to the Boss, "is what silence is like. Notice the absence of phone calls - because the phone system hangs of the UPS units."

"Notice," I say, "the sudden lack of fresh air - because the aircon system has noted a power anomaly and shut down all the large chiller and fresh-air-fan motors."

"Notice," the PFY says, "the sound of sirens in the distance - because parts of our fire system which SHOULD have been replaced use normally open relays to detect a fire condition."

"Notice," I say, "the complete lack of sound from the server room. Not a single whirr, no buzzes, no clicks. Not even the sound of an engineer trying to explain something - because the halon system*'s connected to the crappy old relays."

"Wasn't it all brand new sensors in there?"

"Who can tell? Perhaps you can go and check?"

"Uh.. no, I'm sure you're right," the Boss says nervously.

"No seriously, go and check!"

"No, it's fine - I'm sure I can see perfectly through the viewing window."

"Ah good, so at least we salvaged something from today!" the PFY comments.

"?"

"You know what transparent means now..." ®

* A fire suppression system that works with halon gas.

5 ways to reduce advertising network latency

Whitepapers

5 ways to reduce advertising network latency
Implementing the tactics laid out in this whitepaper can help reduce your overall advertising network latency.
Supercharge your infrastructure
Fusion­‐io has developed a shared storage solution that provides new performance management capabilities required to maximize flash utilization.
Avere FXT with FlashMove and FlashMirror
This ESG Lab validation report documents hands-on testing of the Avere FXT Series Edge Filer with the AOS 3.0 operating environment.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: 4 steps to get more email to the inbox
This whitepaper lists some steps and information that will give you the best opportunity to achieve an amazing sender reputation.

More from The Register

next story
Dedupe-dedupe, dedupe-dedupe-dedupe: Flashy clients crowd around Permabit diamond
3 of the top six flash vendors are casing the OEM dedupe tech, claims analyst
Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Dragons' Den star's biz Outsourcery sends yet more millions up in smoke
Telly moneybags went into the cloud and still nobody's making any profit
Hong Kong's data centres stay high and dry amid Typhoon Usagi
180 km/h winds kill 25 in China, but the data centres keep humming
Microsoft lures punters to hybrid storage cloud with free storage arrays
Spend on Azure, get StorSimple box at the low, low price of $0
WD unveils new MyBook line: External drives now bigger... and CHEAP
Less than £0.04/GB, but it loses the Thunderbolt speed
VMware vSAN test pilots: Don't panic but there's a chance of DATA LOSS
AHCI SATA controller won't play nice with Virtzilla's robo-storage beta
prev story