Original URL: http://www.theregister.co.uk/2007/08/06/edbd_vs_tdbd/

Evolutionary vs. traditional database design

DBA fights back

By Mark Whitehorn

Posted in Developer, 6th August 2007 09:05 GMT

We recently published an article on the advantages of evolutionary database design (EDBD), a process which has its roots in the agile/extreme programming world. To provide a little balance, some yang for the yin, we asked Mark Whitehorn to comment on the article and give his views on EDBD vs. the more traditional database design approach.

What is traditional database design?

Just as it is clear that there is no consensus about exactly what constitutes agile modelling and eXtreme programming, the same is true of traditional database design (TDBD). Where three traditional designers are gathered together, there you’ll find four views of how best to design databases. However, for the sake of discussion, let’s assume that we are talking about development based on User, Logical and Physical modelling.

Business analysts talk to users (who have the User model located conveniently inside their heads); from these discussions, the analysts develop a Logical model. This is typically an Entity Relationship (ER) model, which can be signed-off by the users. The developers then take this off into a corner and add the geeky, technical stuff (data types, indexes, etc.) which turns the Logical model into a Physical one. Finally, they push a virtual button in the modelling tool and out pops the database schema appropriate for their engine of choice.

If you’re not with us, you’re against us.

One facet of the article under discussion is that it treats traditional database designers somewhat dismissively: ‘data professionals have pretty much missed out on the fundamental ideas and techniques…. They've got a lot of catching up to do.’ and ‘It’s going to be a huge effort to get existing data professionals to recognize and then overcome many of the false assumptions which their “thought leadership” have made over the years.’

While this is guaranteed to cause a glow of schadenfreude-like satisfaction in EDBD devotees, it is also likely to alienate the very people that the article is presumably trying to convert, which I believe is a shame. Any established process like TDBD actively benefits from the occasional challenge to see if it can be improved or should be replaced. Such dialog is not aided by a flame war where the only winners are the sellers of entrenching tools.

The article puts forward the premise that ‘Traditional approaches to database design clearly aren't working out for us’, so let’s start by taking a look at that idea.

Does traditional database design cut it (the mustard, that is)?

I certainly agree that some impressively huge projects have had their share of media attention – no one mention the NHS IT project, for example (oops, too late). Most of us are aware of traditional projects which have gone down so spectacularly that the flames have lit up the sky for miles around.

Even if we look at less illuminating failures there is no doubt that TDBD projects do regularly produce, to quote the original article, ‘tables that have columns that are no longer being used… columns that are being used for several purposes because it was too hard to add new columns … tables with data quality problems’. Indeed, I wouldn’t even try to pretend that this is anything other than a major problem.

Oh, so that’s it, Mark; you accept that the TDBD process is flawed then? Well, no. The EDBD argument at this point appears to be “We see a great number of bad databases, therefore the design process is flawed, and therefore we must change it.” The problem is here is the non-sequitur between the first two clauses.

Consider the following argument “We see a great number of road accidents, therefore the rule set governing driving is flawed, therefore we must change it.” In fact the majority of accidents occur when people implement the rule set badly. The rule set says don’t drink and drive, but people do. People break the speed limit; they jump the lights and so on. So the accidents we observe tell us nothing about how good (or bad) the rules are.

In like manner, I agree we observe many flawed databases, but simply observing them neither proves nor disproves the efficacy of the process.

So, why do we observe badly designed databases?

In my experience, there are two main reasons why databases end up poorly designed.

  1. Even in this day and age it is relatively common to come across commercially available databases that were initially designed by specialists in the field that would be serviced by the database, rather than by professional database developers. Here I am thinking of accounting databases designed by accountants or HR systems designed by heads of personnel. These applications are often good in terms of the functionality they try to provide but very poor in terms of design.
  2. Equally sadly, but still true, we see commercial databases designed by ‘computer professionals’ who, with the best will in the world, are not trained database developers either and do not have a full understanding of the task and its ramifications. The design does not proceed according to the traditional model.

My experience (and that is all any of us can apply with certainty to issues like this) is that these two account for by far the majority of the poorly designed databases that I’ve come across.

So, do we observe any traditional databases that are well designed? Of course we do. There are plenty of examples but they’re usually unremarkable, invisible even. Well designed, well structured, they just work. In a perfectly fair world they would attract headlines like: “Shock Horror! Database comes in on time, below budget and works! Heads won’t roll!” But, for fairly obvious reasons, they don’t.

Can the traditional model handle change?

Another major criticism aimed at TDBD by the EDBD community is that the traditional approach is poor at handling change. ‘Unfortunately, the traditional data community assumed that evolving database schemas is a hard thing to do and as a result never thought through how to do it.’

So, does TDBD have a mechanism for handling and implementing change? Oh, yes. Users propose a change and, following discussions to ascertain a full understanding of what is required, the change is incorporated into the logical model. This is echoed down to the physical model; a change schema is produced, tested and ultimately applied to the operational database. Does it work? In my experience, it works perfectly well when the process is neither over- nor under-managed, is properly resourced and applied intelligently.

Does it always work? Sadly not. It is demonstrably true that some traditional databases are very, very difficult to evolve. No question, this is also a serious problem and once again I think it is important to look at why. In my opinion, there are two main causes.

  1. The database is initially well designed and, in order to keep it so, the development team goes overboard with processes to control change management. The change process is made cumbersome to the point where it is unworkable. Changes can only be made very slowly; in practice, too slowly to be effective.
  2. The database is initially well designed but poor management thereafter prevents it from being properly maintained. Lip service is often paid to the need for a change management process but in practice ill-managed changes are rapidly and unintelligently applied to the database. These cause the structure to degrade over time, rendering it more and more difficult to change the schema.

In essence, there’s a story here that many database developers will recognize. Rapid changes to the schema are forced upon the development team and the resources (time and effort) required to implement those changes properly are not made available. In vain they complain that such changes will have a detrimental effect on the structure and viability of the database over time. Each time they are told “Yes, yes; thank you very much for your input. We agree that sustainability over time is very important to the business but this is an exception and you will rush this one change through for us. There will be time later to document the changes and rethink the overall design so don’t worry about it.” Eventually the structure becomes so gnarly that any change causes more problems than it fixes.

I’m not in any way trying to excuse these practices; I’m only trying to explain why we see so many poorly structured, poorly maintained databases. It is also clear that many EDBD supporters come from the application development world and it is the slow evolution that really drives them crazy. If it’s any consolation, it drives me crazy too, which is why I have so much sympathy with them; I believe they are trying to address a real problem.

So, have we proved that the TDBD process needs to be replaced?

No, I don’t believe that the existence of badly designed and/or maintained traditional databases proves that the current design process is flawed. Indeed we can argue that the existence of successful TDBD project helps to support the view that it isn’t the process that is flawed but the implementation.

However this certainly doesn’t mean that I have somehow proved that the TDBD is superior to the EDBD. So the next question that needs to be addressed is:

If all other factors were equal, which is more likely to succeed, EDBD or TDBD?

This is an important question and the correct answer is “I don’t know”. No-one knows. Lots of people have opinions but no-one really knows. One reason for the doubt is the sample size. There’s a very large number of examples of TDBD (both well and poorly executed) and very few examples of EDBD, of either sort.

But something we are sure about is that many factors interact to make a database design project (of either flavour) succeed or fail. These factors include, but are certainly not limited to:

  1. The intelligence of the designers and development team
  2. Their motivation
  3. The resources they are given
  4. The methodology they adopt
  5. How closely that methodology is followed

What we are really arguing about here is how important point 4 is when compared to the rest. My personal belief based on my experience is that the other factors listed here play a very important part in the success of a database design project and that 4 is less crucial. Your mileage may vary, but if I am correct, then it follows that changing the methodology is one of the least effective ways of fixing the problem.

To look at this another way, does anyone want to try an EDBD project using poorly motivated, untrained, under resourced people?

What should we do to address the problem of poor database design?

Address the other issues, of course:

I think the solution sounds easy; so why's it rarely implemented in practice? Because it is, in fact, rather hard. Intelligent people are difficult to recruit, managers who can motivate well are rare, and it costs money to provide the resources. So people cut corners. This is the reality. And none of this will change if we move to EDBD.

So, what about EDBD?

I have tried not to turn this article into an attack on EDBD because I think that doing so is counter productive. However, I also think it’s worth mentioning one issue that does concern me - TDBD essentially centralised the ER modelling, EDBD essentially decentralises it.

Why do I think that this is a problem? Well, there will always be a tension between the requirements of the enterprise; it simultaneously demands that we:

  1. make changes to the database yesterday in order to accommodate the changing business processes.
  2. provide clear, consistent analytical information that spans years.

In order to ensure the latter we need a clear overview of the data structure and what the data ‘means’. This is one of the many reasons why the TDBD process not only centralises the ER modelling but also considers design in terms of the user, logical and physical models. This isn’t ‘busy work’; it is vital to enable us to keep track of both the data and, equally importantly, its meaning.

My impression is that EDBD has its roots in the application developer community. At worst application developers tend to see the application as king and the database as an inconveniently complex and obstructive repository in which they are occasionally forced to store data. They tend to favour processes that support option 1.

TDBD has its roots in the database developer community. At worst database developers see the database as a temple and applications as annoying processes that, unless watched closely, will trash the temple, despoil the data and mangle the meaning. They tend to favour processes that support option 2.

Our job, no matter what our background, is to use our common sense and provide the best balance we can between these two options.

Dr. Mark Whitehorn has been developing databases for over 20 years. During that time, he has published 9 books and 2 million words in well over a thousand articles. In the commercial world, he works as a database and data warehouse consultant. In the academic world, he holds an honorary position at Dundee University where he teaches advanced data handling techniques. He also hold a research position at Cambridge University where he applies these techniques to improving our understanding of how Darwin developed the theory of evolution (somewhat ironically, given the title of this article).