IBM insider: How I caught my wife while bug-hunting on OS/2
No wonder that chkdsk flaw was never fixed
Testing was a concept Microsoft struggled with
Testing OS/2 was a concept Microsoft struggled with at the time. More than once I had to deal with Redmond code that was reported to have passed and failed various quality assurance tests with complex behaviours, yet for some reason that code consisted solely of a RETF instruction, which (in theory) simply ends a subroutine call.
Their code reviews were a joke. Their developers put source code comments of the form “Skip over random IBM NLS shit” in the support for national languages, and the comment “if window count is zero return false” was next to a line that always returned zero. Microsoft flatly refused to fix this one saying that it conformed to Redmond's coding standards, a copy of which we never managed to acquire if it actually existed.
We, on the other hand, were regarded as hopelessly bureaucratic. After Microsoft lost the source code for the actual build of OS/2 we shipped, I reported a bug triggered when you double-clicked on Chkdsk twice: the program would fire up twice and both would try to fix the disk at the same time, causing corruption. I noted that this “may not be consistent with the user's goals as he sees them at this time”. This was labelled a user error, and some guy called Ballmer questioned why I had this “obsession” with perfect code.
IBM had sheets of stats on productivity and financials, the software development equivalent of AIDS. For no good reason, IBM thought software quality correlated to things like the abundance of newline characters in the source code.
So Big Blue extended its bizarre measure of productivity - the number of KLOCs, or 1,000 lines of code - to such an extent that the source code editors we used came ready with macros to bulk up your code; for example, it would extend C comments over multiple lines to make your code pass insanely dumb metrics. Suddenly, everything looked good.
Because we were starting from a largely clean base, we could do things right with OS/2. Even with the benefit of more experience and hindsight I think nearly all of the engineering decisions were the right ones and the implementation was pretty sound by the standards of the time.
IBM's Personal System/2 PC was announced at the same time as OS/2, and the computer was supposed to primarily run our operating system - but the first shipments of the hardware ended up using PC-DOS.
Most OS/2 developers at IBM and Microsoft not only didn’t use PS/2s, we weren’t even aware of their existence until too late. The parallel with Windows Surface tablets is quite striking here - a spurious marketing connection between hardware and software. Whereas PS/2s struggled to run OS/2 properly, the Surface can’t run a decent version of MS Office at all and is incompatible with Windows on purpose.
OS/2 was elegant
As OS/2 quickly evolved from an extended DOS, shared libraries and threading support were added to the mix as was the idea that the operating system's software interface - the API - should be carefully designed rather than allowed to become a mess of randomly named functions.
The API was coherent enough that you could guess the order and type of parameters because they followed a pattern without reading the documentation.
There were real arguments over the API design, though, and it was not astonishing to see a six-page change request for the name of one API call. This doesn’t sound too bad until I share that the call was eventually named WinBeep. There were even existential arguments over whether beeping should be allowed.
Still, my favourite was the SheIndicatePossibleDeath whereby the Shell (She) would signal that that the system was not well and that steps should be taken to recover or gracefully restart. The Microsoft devs thought this was hilarious and instead felt that the Trap D black screen of death was all a user needed to see in such cases.
Oh, I see! Trap code D, it's so obvious what's wrong
They, of course, eventually demonstrated their superior skills by upgrading Windows NT to the blue screen of death and giving secretaries, accountants and other office users such vital information as the various memory addresses involved in the screwup so that they can patch a dodgy device driver.
So, were there any API code examples?
No, you fool. The OS/2 API documentation was programming-language neutral. Some examples of actual code was in the development kit, and it was of high quality, but it was perhaps one per cent of what needed to have been written.
Documentation was hard to maintain due to the sheer speed at which changes were made to the API. One of the times I made myself unpopular involved revealing a simple mathematical model that showed the rate of change of the system was so high that our developers could not keep up and the testers could not even write the tests for it.
This meant the project would either be late or never ever reach completion and there was no third possibility: shipping it on time. I wasn’t the only person pointing this out, but no senior IBM manager wanted to be the first to say with authority that the product would be late. I can’t believe you haven’t seen that on your own projects.
I write this article with hindsight, but be clear that I was near the bottom of the food chain and most of the decisions seemed reasonable enough to me.
In spite of this, OS/2 was easier to program than anything else you could find. We knew this for a fact and buried somewhere in IBM are the videos to prove it. IBM hired in skilled programmers from every platform and asked them to carry out various programming tasks in the usability lab and they took longer than they should have.
The problem was that the developers kept on asking “how do I do X” where X was some hack or workaround you didn’t need to do in OS/2. Mac and Windows developers actually seemed quite angry that so many of their favourite kludges were not needed.
After years of manuals that consisted of insider jokes and interesting puzzles, the Unix devs admired our documentation. The DOS programmers thought Christmas had come and wanted to come and work with us.
I tried and miserably failed to get those videos put out as part of the advertising for OS/2 which was, well, really quite like the advertising for Windows 8. You could tell a lot of money was spent, but it left no real reason in your mind to actually do anything about it. In any case like Microsoft now, IBM wanted to talk to “real people”, not those who understood computers and made the IT decisions for businesses.
Documentation is another part of the whole saga where I share in the guilt of failure. This was before I took up writing, and I could have helped the documentation team more but it was a bit dull.
I was on the inside track of the OS with which MS and IBM both wanted to rule the world; I expected to make real money from the project, so the fewer plebs who understood OS/2 programming, the better for me and IBM.
Or so we thought - look out for part two. ®
Sponsored: RAID: End of an era?