Original URL: https://www.theregister.com/2012/12/21/financial_software_disasters/

What Compsci textbooks don't tell you: Real world code sucks

Bodged code, strapped-on patches, beellion dollar screw-ups... and that's the good stuff

By Dave Mandl

Posted in Devops, 21st December 2012 10:19 GMT

There’s a kind of cognitive dissonance in most people who’ve moved from the academic study of computer science to a job as a real-world software developer. The conflict lies in the fact that, whereas nearly every sample program in every textbook is a perfect and well-thought-out specimen, virtually no software out in the wild is, and this is rarely acknowledged.

To be precise: a tremendous amount of source code written for real applications is not merely less perfect than the simple examples seen in school — it’s outright terrible by any number of measures.

Due to bad design, sloppy or opaque coding practices, non-scalability, and layers of ugly “temporary” patches, it’s often difficult to maintain, harder still to modify or upgrade, painful or impossible for a new person joining the dev team to understand, or (a different kind of problem) slow and inefficient. In short, a mess.

Of course there are many exceptions, but they’re just that: exceptions. In my experience, software is, almost as a rule, bad in one way or another. And lest I be accused of over-generalising: in more than 20 years I’ve done work for maybe a dozen companies, almost all of them in the banking industry and many of them household names.

The technology people employed at these companies are considered to be the very best, if only because the pay tends to be so good. I’ll play it safe and stick to my actual experience in the financial sector even though I'm convinced this state of affairs is not limited to that one industry.

Getting back to the cognitive-dissonance problem: in casual discussion, developers and tech managers will talk about all the wonderful things their system does, the stellar technical skills of their team, and how much their users love them — and all that may be true.

But talk privately, colleague-to-colleague, to one of these developers about the quality of the code base, all the daily headaches, the quick hacks and patches, the laughable mistakes made by the original author of the system (who left the firm a couple of years ago), or the fear that the person who “knows the system” will leave for another job, and you’ll hear a different story: “Of course there are problems. Everyone knows that. Things are always this way — it’s barely even necessary to mention it.”

Very few coworkers with whom I’ve broached this subject have seen things differently, and I’ve often heard stories of costly screw-ups that would shock the most jaded techie. But on a daily basis, in all but the worst cases, it’s easier for developers to talk about things like what their system does, or how elegant its user interface is, than to dwell on any horrors lurking inside.

It also may be that, after years of working on a system with serious maintainability flaws, people simply become accustomed to the strange procedures they have to go through regularly to keep things running.

Complex systems + borked code = beelllions down the drain

In the financial business there have been several software-related blowups in the last few years that were big enough to make it onto the evening news. To name just three, there were: the Nasdaq failure that wreaked havoc with Facebook’s IPO; a trading fiasco at Knight Capital in August that led to widespread market disruption and a $400m drop in Knight’s market value; and the “flash crash” of May, 2010, which caused market losses of at least $1 trillion in a matter of minutes.

System glitches and bugs this visible and this costly are relatively rare, but for every one of them there are a hundred smaller ones that only a handful of people ever hear about. A Reuters article this summer with the title “Morgan Stanley Smith Barney Rainmakers Consider Exit” said this: “Several dozen Morgan Stanley Smith Barney advisers who manage tens of billions of dollars of client money are considering leaving the firm, saying that widespread technology problems have made it very difficult for them to do their jobs.” (Italics mine.)

These are all outright failures in highly complex systems, but poorly written code can crop up in applications of any size, and it may not lead to a direct, quantifiable loss. It will, however, require untold extra hours of work for routine support, make even minor upgrades painful, or force systems to be retired prematurely (in some cases, before they’ve even gone live). How does this happen?

Crappy software: Is it bad programming, or is it 'too good'?

The most common reason for the existence of bad software is bad programmers. Good software, misleadingly, is usually easy to read, but it’s not easy to write. There are an awful lot of developers out there who never learned the correct way to do things. Maybe they’re so enamored of a particular technology or coding technique that they insist on using it whether it’s appropriate or not (“If the only tool you have is a hammer…”). Maybe they’re in over their head on a project with a huge number of moving parts. Maybe they’ve been forced to pick up an unfamiliar language at a moment’s notice. Or maybe their thought processes just don’t translate into logical, supportable code. The best technologists will usually seek out the least boring work, or the highest compensation, but even the most exciting project may be staffed with bad programmers merely because budget constraints, or stinginess, prevented the firm from shelling out for more talented ones.

At the other end of the spectrum, many projects are sabotaged by developers who are “too good” — that is, people who insist on coding everything in the most complicated and impenetrable way possible. This may be because they feel the constant need to show how much they know, or because doing things the simple way is just not interesting enough. As one friend of mine, a heavyweight who has had to rewrite many terrible applications, once said to me: “They think that if they’re not writing 80 lines of code to add two numbers, they’re not using their education.”

In my experience, these people can cause more harm than anyone else. I’ve seen developers use the most tangled object-oriented techniques to do things that could have been accomplished much more easily with a trivial 10-line function. In C++, an everything-but-the-kitchen-sink language used heavily on Wall St, templates (to give just one example) enable this kind of behavior by allowing you to create the most esoteric generic classes imaginable.

In one case where I had to take over development from a C++ guru who felt the need to do everything in the most opaque, “sophisticated” way possible, his components simply had to be scrapped and rewritten from scratch. I couldn’t begin to understand the code, and neither could a colleague who was one of the best C++ developers I’d ever worked with. Four solid months of work in the trash bin.

If the original developer had stayed with the firm and finished the project, that would only have deferred the day of reckoning, since no one could ever have taken over support of this monster. (The joke name on Wall St for this kind of situation is “job security”: The sole expert on this system could never be sacked.) But even if the code had been marginally comprehensible, support would still have been a nightmare for anyone but the original developer, and it’s likely that a new person would have broken things by trying to make changes to delicate classes that he didn’t fully grasp.

Time-savers and face-savers

Another source of bad code is laziness. For programmers, there’s “good” laziness, which drives them to build tools that will relieve themselves and others of unnecessary drudge work — that’s what we’re here for, after all. And then there’s “bad” laziness, the kind that leads programmers to cut corners or do things in the quickest possible way, rather than taking the three extra hours to do them right. This always comes back to haunt someone—possibly the person who takes over six months later and doesn’t know that this tiny block of exceptional code exists, or why. Patches, almost by definition, are changes made without thought to the long-term consequences, and often sloppily, because they’re usually considered “temporary.”

For the record, however, I don’t think I’ve ever seen anyone go back and clean up a quick-and-dirty fix made two years previously just because it was the right thing to do. If the system is working, almost no manager will pay just to have you recode a piece of it “the right way,” without adding any new functionality. There’s always something more important that needs to be done—until that quick-and-dirty fix blows up and (because it’s urgent) gets replaced by another quick-and-dirty fix. To some lazy programmers, it must be said, none of this matters: They take the easy way out precisely because they know they won’t be around when their time bomb explodes.

There are languages that by their very nature make it easier to write bad code. As much as I love APL, a powerful language I once worked in that makes heavy use of Greek letters and other cryptic symbols, it’s easily abused, and I’ve seen some horrific APL systems written by people who hadn’t been trained properly.

(Unfortunately I had to support one such system early in my career. I prayed every day for a quick, painless death.)

Conway's famous Game of Life in Dyalog.com's one line of APL code

If, as an exercise, you wanted to write a program that no one in the world could make heads or tails of, the K language would make that a breeze: I once worked in a group that had a large codebase in K (which as it turns out is a distant, ugly relative of APL), and it never took me less than a half hour to decipher any one line of it.

As mentioned above, C++, despite its superficial similarities to Java, is infinitely easier than Java to write impenetrable code in. And one language I’ve been warned about, though I’ve never had the opportunity to use it, is Haskell, an offshoot of ML. According to a friend in academia who’s studied it, it’s “the Taliban version of ML,” in which it’s all but impossible to write readable code.

This Haskell line prints all the powers of 2 as explained on Stackoverflow

Ultimately, the greatest enemy of good programming practices is time. One of the reasons the code in your textbook is perfect and the code where you work isn’t is that the author of the book was allowed, or forced, to do things right.

In the real world, tight budgets, shortsighted managers, and unreasonable expectations from non-techies almost always conspire to make developers do things too quickly. The final product may be good enough now, and be perfectly understandable to the people who’ve just written it, but all that will change in a year, when there are new requirements and a new set of developers grappling with the hastily-thrown-together code. Additionally, the codebase in even a small production system can be orders of magnitude bigger than in most textbook examples, and large systems are far from easy to build. Despite protestations to the contrary, projects greater than a certain size and complexity (see the Reuters article cited above) are almost guaranteed to fail in some way without sufficient time for planning, design, testing, and adult supervision.

All the above aside, there’s one simple and completely painless way to prevent future generations from cursing you when they look at your code: Include some comments! ®