Original URL: https://www.theregister.com/2011/12/15/safe_engineering/

The moment a computer crash nearly caused my car crash

Engineering mustn't succumb to 'blame the user' culture

By Trevor Pott and Iain Thomson

Posted in Systems, 15th December 2011 10:19 GMT

Sysadmin blog I very nearly had a terrible car accident: my car almost left me stranded on the tracks of my city's light rail transit. The short version of the story is that my car started acting up, of all times, as I was on the way to the mechanic for an oil change.

“Acting up” in this case meant refusing to go above 20kph (12.4mph) for brief periods, then driving normally. This would be followed by a great big “thump” and straight back down to 20kph. At random intervals, my speedometer would gyrate wildly between 40kph and 80kph (49.7mph), despite maintaining a fairly constant speed around 30kph (18.6mph).

I thought my transmission might be going. Maybe, maybe not; my experiences with transmission failure usually indicate a pretty binary and obvious failure mode that means “car doesn't go”. “Car periodically decides to slow down but not stop,” was a new one - I decided to plough on forward and try to make it the last 20 blocks to the dealership.

My car seemed to behave itself for about two blocks, and then completely stopped responding to input. For five terrifying seconds, I was trapped on the train tracks idling forward at 2kph wondering if I would regain control in time to avoid becoming street pizza.

A few moments of adrenaline later and the car picked right back up, giving me enough oomph to pull into the nearest parking lot. After my nerves had calmed down I finished the journey to the dealership; the car naturally behaved perfectly fine as soon as I pulled onto the lot.

Debugging some hard wear

The mechanic ran a pile of diagnostics and came back to me with results that shocked me - the problem was, more or less, a computer error.

The issue in detail was that while stopped at a red light, one of my wheels was apparently stuck on a patch of black ice, while the other wasn't. The wheel on the ice spun wildly out of control while the other wheel barely moved. This triggered both a speed sensor in the transmission and a series of vibration sensors inside the engine (cam shaft, etc.) to report an error state.

The traction control and ABS computer didn't know what to make of this input. It is supposed to respond to such issues by throttling down and judiciously applying brakes to various wheels until such a point as I regain traction.

In this case, the input was over so quickly that I never noticed the wheel spinning, but was apparently of such a magnitude as to confuse the computer something fierce. (The additional errors by the various sensors apparently added to the confusion.)

Thus the traction control computer was actually trying to save me from what it perceived to be dangerous road conditions when in fact no such conditions were present. The end result being that the computer placed me in greater danger. (What if my computer had made such a decision while going 100kph on the highway closely followed by a semi?)

Make no mistake, I am not arguing here that this computer – or others like it – are in any way negative additions to the modern car. That computer has legitimately saved my life on more than one occasion. What is at issue here is user education.

I should have tried turning it off and on again

I am a systems administrator. I work surrounded by various computers of all capabilities and description all day long, every day. And in the heat of the moment, it still simply never occurred to me that the car's computer could be responsible for such a thing.

Mechanical failures sprang to mind. I wondered if the electrical system could be shot. I even considered an EM pulse.

Not once did I think to push the computer's 'off' button located on my dash. It never even went through my mind.

Now that computers are integrated into every facet of our lives, do we really understand them well enough given that they hidden away most of the time? Do we intuitively grasp the input/output relationship, or even give much thought to the presence of computers at all?

And what about us nerds: do we give enough thought to the end users of the computer systems we design? Where do the training and education burdens lie: with the end user, with the designer, or society at large?

We have become a society that deals with these questions through indemnification clauses, EULAs and meaningless warning stickers. We push the burden of education and discovery onto a populace overwhelmed with information but absent true knowledge.

It is unrealistic to expect that end users understand everything about all the computers they interact with. It is probably unrealistic to expect that end users are even aware of all the computers they interact with.

Who is to blame?

Despite this, when an incident occurs that exposes a flaw in system design, we blame the user. When there is an unmet need for communication, education or even simple warnings, we blame the user. While perhaps acceptable when dealing with PCs and smartphones, our blame-the-victim culture must not be allowed to extend to embedded systems.

While fretting about shareholder value we moved our spending away from quality control and over to the lawyers. We obsess over a number here in a column there; budget cuts and indemnifications and pink slips (oh my!)

Computers – especially the critical, embedded computers – need to be properly engineered. Not by a kid who read a book and learned come C#, but by someone with an iron ring. In our mad rush to virtualise this and write an app for that we have forgotten the importance of quality design and the value of truly paranoid engineering.

At the end of the day, after all the rationalizations, agreements, licences and handwringing there is a real flesh and blood human being. In my car, on the train tracks.

Waiting for the computer to reboot. ®