BOFH: Moving faster than blame
It's got to be somebody's fault
>clickety< >clickety< >tip< >tap< “
>clickety< >clickety< >tap<
>TAP< >TAP!< >TAP!<
"Financials server's not responding again!" the PFY says, looking up from his monitor.
"Hmmm. >click< ping test says it's up - must be just the app."
>clickety< >tap< >clickety< >tap<
"Yep, I can login alright, so it must just... >tap< >TAP!< nope, it's hung."
"DISK!" the PFY and I say simultaneously.
"But it's on the SAN!" the PFY says.
"So's the HR server, and that's still working," I respond.
">tap< >TAP!< >TAP!< ... Nope. It's not," the PFY counters.
...Seconds later in the server room...
"The lights say it's working ok," the PFY says, tapping the SAN box.
"Yeah those two green lamps really tell you a lot," I say. "Course, if the disk activity lamps were doing anything it would probably mean more to me..."
"Ah," the PFY says "I hadn't noticed that. It's crashed then, has it?"
My affirmative response is partially interrupted by the ringing of the server room phone which has a dual purpose - to contact us when we're inside the room and to indicate that the Guild of Idiots has taken up camp outside Mission Control with a view to finding out both what happened and when service will be restored.
We wander outside and to talk to the assembled group.
"So what happened?" the head Beancounter wheezes, being a bit tuckered out after the walk from the lift.
"The SAN has crashed," I say.
"When will it be back?" the head of HR asks.
"After it's been reset and done it's data rebuild," I say.
"Have you reset it?" the Boss (and final Guild member) asks.
"Nope, it's remotely managed - remember?"
"No?" the Boss says.
"They do," the PFY says, pointing at the other two guild members.
"What do you mean?" the Boss asks.
"The companies that produce our HR and finance software merged last year and created a single unified product," I explain.
"Their merger was so successful that they approached a hardware vendor to come up with what would be a complete turnkey solution," the PFY adds.
"A turnkey solution so simple that an idiot could administer it," I murmur.
"Although at the time we didn't realise it was a requirement," the PFY says as an aside, nodding at the two department heads.
"The idea was magic!" I continue. "An administrative web page which would allow the users to tailor, start, stop, backup and recover the application without systems people's intervention. And the plan was good - right up until the new company decided that the hardware merger wouldn't be returning the expected gain in profit and so alternative vendors were sought."
"Vendors who could deliver a healthy additional profit," the PFY adds.
"Vendors who didn't know the meaning of the word expensive."
"Or reliable," the PFY again adds.
"So when the combined forces of HR and finance decided that they wanted this solution we suggested that the proposed hardware might not be great for a production system."
"And we were ignored," the PFY adds.
"And so we now have a situation where the disk hardware appears to be as unreliable as the server hardware."
"The server hardware can't be that unreliable - I can't remember the last time it went down," the head of HR says.
"That's because you don't use it all the time, AND because we have a master reset feature at our disposal."
"Master Reset?" the head of HR asks.
The guild follow me into Mission Control and I show them the master reset system.
"Behold," I say pointing high above my head.
"Two pieces of string?" the Boss asks...
"Two pieces of string which go through the wall, through the comms room, through the comms room wall, across the computer room and down to the HR/Finance rack to one of two levers."
"Yes, levers poised above the reset button on the servers. A quick pull on this and your servers go down faster than Paris."
"You can't be serious!" the Boss burbles.
"Sure am! I'm not wandering into the computer room three times a day to restart one of their machines!"
"Well give it a pull then," the head of HR says. "Because we're supposed to be doing a bank run for the salary payments and we've only got 23 minutes."
"I could give it a pull, but it's the disk that's crashed, not the machines. You want us to install a disk reset as well?"
"I... suppose so," the head of HR responds reluctantly.
"I'll get it," the PFY says, heading off to the computer room with a battery drill and a ball of string...
A few minutes later the PFY's back with a concerned look on his face.
"Yes?" the head of HR asks.
"The SAN was crapper than I thought - the reset button broke off when I pulled the lever and is wedged between the panel and a case so it's stuck in reset mode."
"Take the panel off then!" the head of HR gasps.
"It's riveted on!"
"Drill them out!" the head of HR snaps. "You've got 11 minutes!"
"Yeah well... I don't know that that's the best course of..."
"You're wasting time - just do it. You've only got 10 minutes now!"
...2 minutes later...
"I blame the manufacturer," I say to the PFY as I tap the front of the SAN.
"Yeah... security rivets with hardened faces on a piece of crap kit like this," the PFY says.
"I know. A shame the front panel isn't hardened though."
"Or the front of the chassis," the PFY says.
"Or the controller board," I add, gazing into the newly created hole in the now very dead SAN.
"Or all that cabling behind the controller board," the PFY says looking deep into the hole. "...So, back to Mission Control to fess up? I... Hello?"
...microseconds later, at the pub...
>ring< >ring< >ring< >ring<
"Leave that," I say to the Barman as he reaches for my cellphone helpfully. "It'll just be my assistant - about a technical matter."
"Yeah, how to move faster than blame..."
Sponsored: Benefits from the lessons learned in HPC