On data models, data types and dangerous liaisons
Context, dear boy, context
Comment A data model is a methodology for storing, handling and manipulating data. There are lots of them around. One of the most commonly employed at present is the relational model. Brainchild of Edgar Codd, it rapidly came into favour after he published his seminal paper in 1970. Many of the popular database engines today (for example DB2, Oracle and SQL Server) are based on the relational model.
Data models are often relatively complex entities, made up of a set of interwoven checks, constraints and rules. Any one rule, on its own, may make little sense; but the entire set, working together, should produce a useful and internally consistent system.
The complexity of these models means that it often takes some time to get to grips with them. For reasons that presumably only a psychologist could explain, when some humans learn a complex system they can develop an attachment to it. A fanatical attachment. The data model that the poor unfortunate victim has learnt becomes the “one true way”. There are no longer several data models from which to choose, there is one correct data model and all of the others are, quite clearly, evil. By this point, we aren’t simply in “blue suede shoes” territory; we have entered “work of the devil” land.
This is unfortunate because, in truth, all data models have their pros and cons. This is an inevitable side effect of the interlacing set of rules that make them up. Take the relational model. It has a number of rules, systems and features (for example, referential integrity) that allow it to provide relatively high data integrity. The downside of these rules is that, every time data is added to, or modified in, the database, the rules must be deployed to check the integrity of the data. This slows the database down. So the pro also has a con.
If we move to the multi-dimensional model of data, we can build structures that are tens of thousands of times faster to query than would be possible using comparable relational model. We gain speed - but we lose many of the data integrity rules.
Any argument about which of these data models is the ‘best’ is ultimately meaningless, unless we know the context in which it will be deployed. If query processing speed is of enormous importance to the prospective end-user of the system then that is clearly going to influence which of these two data models is ‘best’ in this particular context.
This context is equally important within the structure of any data model; take data types for example. There is a huge range of possible data types out there in the big wide world of databases – integer, binary, text, BLOB, XML, multi-valued and so on.
Asking for my opinion on, for example, the multi-valued data type (MVDT) is like asking whether I approve of dynamite or not. I don’t have a view either way until I know the context.
"So, Mark, you think it is appropriate to carry dynamite in a crowded street do you?" No, of course not. The urban environment provides a very poor context for explosives. What about using it in controlled conditions to remove tree-stumps from the ground? Fine. (If you are an environmentalist who loves earwigs and feels sorry for the trauma inflicted upon them by the use of dynamite in this way, I apologise in advance.)
In other words, neither dynamite nor MVDTs inherently display any “good” or “bad” qualities whatsoever. Such attributes depend entirely upon context.
Now let’s look at MVDTs in the context of Access. Microsoft has recently announced that its PC-based database engine is to acquire this data type. This is a whole new data type for Access, one which it has so far managed to do without, throughout its 14 year life span. Access is a relational database engine. (Please, let’s not get into a discussion about whether it is truly relational or merely close to relational. It is about as relational as any other engine that purports to be relational today.)
One of the rules of the relational model is that all of the data held therein is atomic. In fact, this is one of the core rules of the relational model; by which I mean that it has a huge effect on many of the other components of the model. For example SQL, the language commonly used to manipulate relational databases, is completely based around the idea that the data it manipulates will be atomic. Unless we extend the language, it does not have the tools necessary to manipulate non-atomic data; such as that found in MVDTs.
You are entirely free to think that this is a good model for data and you are equally free to think that this is a bad model for data. Your opinion (and/or mine) does not change, by one iota, the fact that the core internal rules that govern relational databases happen to forbid multiple values in a single field.
The dialect of SQL that Access uses has no specific operators to handle MVDTs. So the introduction of MVDTs into Access will, I believe, make life more difficult for users of Access. I believe that Access developers should eschew their use, as I said here [and, in fact, we were a little surprised when some readers saw this article as a generic put down of MVDTs in any context – The Eds].
Now ask me what I think of MVDTs in the context of non-relational databases such as the PICK-related products (the brain-children of Dick Pick; there is some context here). These database products, such as IBM’s U2 (and even Intersystems Cache, which effectively divorces the logical view from the underlying physical implementation and thus supports the PICK MV model, the relational model and several others at the same time), are designed to use MVDTs and have the necessary operators to manipulate the data in them - so, no problem. MVDTs are perfectly valid in this context.
The relational model is just another data model; but the complex rules that comprise it make it a poor environment for MVDTs. Each on its own is fine; but taken together, the relational model and the MVDT form a dangerous liaison.®