On data models, data types and dangerous liaisons

Context, dear boy, context

Build a business case: developing custom apps

Comment A data model is a methodology for storing, handling and manipulating data. There are lots of them around. One of the most commonly employed at present is the relational model. Brainchild of Edgar Codd, it rapidly came into favour after he published his seminal paper in 1970. Many of the popular database engines today (for example DB2, Oracle and SQL Server) are based on the relational model.

Data models are often relatively complex entities, made up of a set of interwoven checks, constraints and rules. Any one rule, on its own, may make little sense; but the entire set, working together, should produce a useful and internally consistent system.

The complexity of these models means that it often takes some time to get to grips with them. For reasons that presumably only a psychologist could explain, when some humans learn a complex system they can develop an attachment to it. A fanatical attachment. The data model that the poor unfortunate victim has learnt becomes the “one true way”. There are no longer several data models from which to choose, there is one correct data model and all of the others are, quite clearly, evil. By this point, we aren’t simply in “blue suede shoes” territory; we have entered “work of the devil” land.

This is unfortunate because, in truth, all data models have their pros and cons. This is an inevitable side effect of the interlacing set of rules that make them up. Take the relational model. It has a number of rules, systems and features (for example, referential integrity) that allow it to provide relatively high data integrity. The downside of these rules is that, every time data is added to, or modified in, the database, the rules must be deployed to check the integrity of the data. This slows the database down. So the pro also has a con.

If we move to the multi-dimensional model of data, we can build structures that are tens of thousands of times faster to query than would be possible using comparable relational model. We gain speed - but we lose many of the data integrity rules.

Any argument about which of these data models is the ‘best’ is ultimately meaningless, unless we know the context in which it will be deployed. If query processing speed is of enormous importance to the prospective end-user of the system then that is clearly going to influence which of these two data models is ‘best’ in this particular context.

This context is equally important within the structure of any data model; take data types for example. There is a huge range of possible data types out there in the big wide world of databases – integer, binary, text, BLOB, XML, multi-valued and so on.

Asking for my opinion on, for example, the multi-valued data type (MVDT) is like asking whether I approve of dynamite or not. I don’t have a view either way until I know the context.

Mr Dy-na-mi-tee

"So, Mark, you think it is appropriate to carry dynamite in a crowded street do you?" No, of course not. The urban environment provides a very poor context for explosives. What about using it in controlled conditions to remove tree-stumps from the ground? Fine. (If you are an environmentalist who loves earwigs and feels sorry for the trauma inflicted upon them by the use of dynamite in this way, I apologise in advance.)

In other words, neither dynamite nor MVDTs inherently display any “good” or “bad” qualities whatsoever. Such attributes depend entirely upon context.

Now let’s look at MVDTs in the context of Access. Microsoft has recently announced that its PC-based database engine is to acquire this data type. This is a whole new data type for Access, one which it has so far managed to do without, throughout its 14 year life span. Access is a relational database engine. (Please, let’s not get into a discussion about whether it is truly relational or merely close to relational. It is about as relational as any other engine that purports to be relational today.)

One of the rules of the relational model is that all of the data held therein is atomic. In fact, this is one of the core rules of the relational model; by which I mean that it has a huge effect on many of the other components of the model. For example SQL, the language commonly used to manipulate relational databases, is completely based around the idea that the data it manipulates will be atomic. Unless we extend the language, it does not have the tools necessary to manipulate non-atomic data; such as that found in MVDTs.

You are entirely free to think that this is a good model for data and you are equally free to think that this is a bad model for data. Your opinion (and/or mine) does not change, by one iota, the fact that the core internal rules that govern relational databases happen to forbid multiple values in a single field.

The dialect of SQL that Access uses has no specific operators to handle MVDTs. So the introduction of MVDTs into Access will, I believe, make life more difficult for users of Access. I believe that Access developers should eschew their use, as I said here [and, in fact, we were a little surprised when some readers saw this article as a generic put down of MVDTs in any context – The Eds].

Now ask me what I think of MVDTs in the context of non-relational databases such as the PICK-related products (the brain-children of Dick Pick; there is some context here). These database products, such as IBM’s U2 (and even Intersystems Cache, which effectively divorces the logical view from the underlying physical implementation and thus supports the PICK MV model, the relational model and several others at the same time), are designed to use MVDTs and have the necessary operators to manipulate the data in them - so, no problem. MVDTs are perfectly valid in this context.

The relational model is just another data model; but the complex rules that comprise it make it a poor environment for MVDTs. Each on its own is fine; but taken together, the relational model and the MVDT form a dangerous liaison.®

Boost IT visibility and business value

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Linux Foundation says many Linux admins and engineers are certifiable
Floats exam program to help IT employers lock up talent
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
BYOD's dark side: Data protection
An endpoint data protection solution that adds value to the user and the organization so it can protect itself from data loss as well as leverage corporate data.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?