Déjà Vista

Has writing operating systems changed since OS/2?

A new approach to endpoint data protection

Comment As Vista slowly slips further into the mists of the future, I sometimes wonder if anything has really changed since I was on the losing side of the IBM-Microsoft OS/2 war. Why do we now hear of huge re-writes in a product that's supposed to be almost ready?

As a former O/S (Operating System) bug hunter, it sounds rather familiar. No matter how disciplined you are in architecture, little issues have a way of forcing you to implement wider-ranging changes, and these build up like water behind a dam. This can become a nasty techno/political problem. Managers will play bluffing games, each hoping another team will take the bullet of making the project late. So, you have managers expressing increasingly fictitious optimism, until the breaking point - when suddenly torrents of issues will flood through, with lots of relief (and a bit of schadenfreude) all round.

Then you can "fix" the project, merely by firing managers at random and allowing them to become blame sinks. It would be too cynical of me to say that the management changes we've seen reported here are the result (El Reg's take on this is here). Way too cynical - I can't be right can I?

It's hard to gauge progress in a huge project like an O/S, so management focuses on a range of statistics and capability milestones. OS/2 reached the point several times where we found bugs faster than they could be fixed. One set were caused by the compiler and processor having different ideas about valid opcodes. Thus, bugs could depend upon the revision level of the CPU, and no code review could help you. Management responded to this more than once by simply banning the finding of bugs, in a denial of bad news that would shame a Soviet spin doctor; you had to be a third level manager to "approve" a bug at one point.

Bug hunters are seen as the bad guys by individual managers, whatever the top level says (some managers think that the testing process itself actually "makes" bugs, which wouldn't be there if you didn't test - Ed]. They bring bad news. In all organisations the bringers of good news prosper over the bringers of bad. At IBM, however, us bug hunters were generally seen as valuable by the organisation as a whole, not least because we annoyed Microsoft; and, yes, blame management became a big issue, complete with its negative productivity. Microsoft has lots of smart people; "evangelists" who ride out to inspire us all with the wisdom of Microsoft ways. Ever heard of its testers? Reckon they're the best paid people in Redmond?

A tester should have a superset of the skills used by developers; but at too many firms testing is a sin bin, and paid accordingly; and it's hard to see that not being the cause of many disasters on the scale of Vista. At IBM, some of us had a simplistic model. Each line of new/fixed code put into the system has some probability of breaking it in a new way, and this probability grows with the size of code already written. So, for a given quality and size of programming team, there is a maximum size of system. Beyond that, any attempt at change is as likely to break something else as to fix a problem. This steady state is permanent and fatal; have the Vista teams hit it?

Yes, I do mean "teams" plural. The failure of OS/2 commercially was partly down to the decision to do version 1.3, and thus stop 2.0 in its tracks. 1.3 was a result of listening to customers, and was a lighter, faster, crap version, of 1.2, whereas 2.0 had loads of cool features like being 32 bit and having the ability to run Windows apps. There are competing teams within any large project – and when one version hits the sweet spot of desirability and plausibility for delivery, do you think that makes them friends?

Individual Microsoft programming units are often larger than several whole companies, and so (it seems to me) "us" may be our bit of Microsoft, and "them" another part of Microsoft; not the official Linux enemy. Also, techies are known for their bitchiness (I have no doubt that this article will prod people to point out some of my screw-ups over the last 20 years) and often bug reports and fix requests are seen as coming from malicious incompetents, not colleagues. We referred to the IBM team rewriting MS compilers as "children playing with matches" and refused point-blank to use their output; and they doubtless had equally unkind things to say about us. "Arrogant prima donnas" is probably all I can use here, without getting Reg Developer added to the banned site list.

The Vista teams must also be hitting "deadline fatigue" by now. People get increasingly cynical about the assertions made by other groups and, after a while, by their own managers, who are torn between honesty with the troops and being seen as a "good team player" by their peers and bosses. Also, in order to actually get a program out of the door, there are always compromises and fudges and things that just happen to work. More than one bug in a Microsoft product has been fixed by deleting something from the manual [that happened back in the days of MVS mainframes too, nothing changes - Ed]. We've all done this, but the more deadlines you rush for, the more "clever hacks" accumulate, and by their nature are not only undocumented, but often unknown to anyone other than the developer who bodged them.

Source code control can fray a bit in the last desperate hours - it's not unknown for the source for the version of memory manager that actually shipped to be "lost". These issues actually make net progress slower, and the sort of managers who genuinely believe that sport is a good metaphor in setting goals for teams often don't realise that trying for an unachievable target doesn't bring out the best in people. In fact, it actually digs a hole for the next wave of cannon fodder to fall into.

DRM (Digital Rights Management) doesn't help. Ever since Fred Brooks wrote The Mythical Man-Month, based on his experiences with OS/360 (and since augmented with new material), we've tried to segment large systems so that bugs don't spread. But to be effective, digital rights must be managed right across the system in a cooperative fashion. That massively increases the effort and bug count, yet as far as the customer is concerned DRM itself is the bug. A "feature" is something you'd pay money to get. DRM does not pass that test.

But I'm not in the OS-writing game any more, thankfully, so I'm looking forward to lots of feedback telling me in detail where my ideas are obsolete.®

7 Elements of Radically Simple OS Migration

More from The Register

next story
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cloudy CoreOS Linux distro declares itself production-ready
Lightweight, container-happy Linux gets first Stable release
prev story


7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?