Feeds

BOFH: One double espresso from meltdown

Total Component Fatigue

  • alert
  • submit to reddit

HP ProLiant Gen8: Integrated lifecycle automation

Episode 16 BOFH 2004

A man could go mad in this business.

One minute you’re hanging off the front of a mainframe shouting about how you’re king of the world - the wave of technology heralding an install which would make even the most hardened geek weep just from the ingenuity, the next moment you’ve got a SCSI card in your hand, not able to understand how, when you put it into a specific machine, it fails to see the devices connected to it.

You plug the card into another box, no problems, the devices make themselves known as they should. You plug it back into the machine it’s supposed to work on, nothing.

When hardware decides to misbehave, it really decides to misbehave.

Unlike the precocious child who will taunt you mercilessly, knowing just how to report the beating they deserve to their school teacher in a manner that will have you in Police custody before lunchtime, hardware is sneaky.

When hardware decides to misbehave, it starts out small. One tiny device doesn’t work properly, but everything else is working exactly as it should. You wander in completely unprepared, believing it's a simple dud disk or a loose cable – because lets face it, that’s what the statistics would suggest.

In today’s case the cable is fine, and replacing the disk has no effect..

I mentally toss up the possibility that it’s an addressing problem – something to do with that particular address - and change the address of the drive and probe the SCSI bus.

Another disk disappears.

I take a quick break to clear my rage and grab another coffee while I'm at it.

While I’m there, I realize that the new test address I chose must have conflicted with the address of the newly missing drive.

>clickety<

Then again, maybe not…

“What’s the matter?” the PFY asks, blundering into a situation that could escalate out of control at a moment’s notice if my temper doesn’t remain in check…

I fill him in on the sordid details, the Boss asking me to get the server up, me lightly saying that it would be up in half an hour, tops, and then the myriad of hardware upsets till now.

“So where are you at?”

“A hardware wizard has popped up on the desktop asking if maybe I want to remove some SCSI devices. But it only has a 'Yes' and 'No' box”

“What other buttons were you expecting?” he asks, voice laced liberally with sarcasm.

“The button saying ‘F-ck off. If I’d wanted a f-ing hardware wizard to read my f-ing mind, I would have configured the f-ing thing in the first f-ing place. Only I didn’t get an option to NOT install the f-ing hardware wizard, did I? NO, because someone at Operating System Central thought that everying f-ing one would want a f-ing hardware wizard to make inane suggestions’,” I say.

“Ah THAT button,” the PFY says. “Say, how many double espressos have you had?”

“Three or Four. Why?”

“Well, I just noticed that you were a little – just a little, mind - testy, and maybe it’s time to take a break…”

“Yes, only I’m on a bit of a time budget with this box. Besides, the Espresso figure was only from this afternoon.”

“And how many this morning?”

“About ten I guess.”

“So you’re just taking your caffeine level past the medical definition of ‘stimulant’ into the ‘poisons’ category?”

“Whatever,” I blurt distractedly. “What’s coming up on the monitor now?” I ask holding the cable in a semi-angled position.

>clickety<

“Yep, I can see them 5 disks, addresses 1,2,3,4 and 5.”

“Bingo – it’s the socket in the cabinet!” I say triumphantly. “One of the connectors in the plug must have moved slightly, probably because the cable’s been bent around the place a bit, putting a lot of stress on the socket and forcing open a contact!”

“So how would you see ANY disks on the Bus?” the PFY asks

“Simple” I say smugly, knowing my experience in this particular field is far superior to the PFY’s own. “The pin concerned is one of the addressing pins.”

“And so how come you had another drive disappear?”

“The physical stress which caused one pin on the connector to fail has most likely caused another to become intermittent.”

So simple when you know how. I put all the disks into a new box and throw out both the box and the old cable, just to be on the safe side.

“Up she comes!” the PFY says, powering up the disk box and rebooting the machine.

“And?” I ask, pushing in front of him.

. . .

“No disks found!” he gasps.

. . .

“So how’s that server going?" the Boss asks, wandering in after what he believes to be a safe interval.

“It’s a hardware problem.” the PFY says “uh…. >flick< >flick< Transient Component Fatigue.”

“Really?” the Boss asks. “I’ve never seen that. Where’s the machine?”

“Well, there’s a bit of it over there in the corner, a bit of it under the desk, and some of it on the table.”

“Bloody Hell! It looks like it’s been hit with a sledgehammer!”

“Yes, TCF is particularly nasty,” I add, helping myself to another coffee. “Also known and TOTAL Component Fatigue. The box basically just falls to pieces.”

“Amazing! Although I did wonder if it might have been that batch of cheap cables we bought a couple of months back.”

“Cheap cables?” I gasp, as the blood rushes to my head and things start to go a little red…

. . .

"You people seen your Boss?" the Head of IT asks

"Went home sick." the PFY says "A touch of TCF."

"TCF," the Head chuckles. "You know back when I was an apprentice, that used to mean something had been hit with a sl... Oh. So who's going to be on the interview panel?"

Give him his due - for a computer crusty, he's a quick learner... ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.