Feeds

On data models, data types and dangerous liaisons

Context, dear boy, context

Security and trust: The backbone of doing business over the internet

Comment A data model is a methodology for storing, handling and manipulating data. There are lots of them around. One of the most commonly employed at present is the relational model. Brainchild of Edgar Codd, it rapidly came into favour after he published his seminal paper in 1970. Many of the popular database engines today (for example DB2, Oracle and SQL Server) are based on the relational model.

Data models are often relatively complex entities, made up of a set of interwoven checks, constraints and rules. Any one rule, on its own, may make little sense; but the entire set, working together, should produce a useful and internally consistent system.

The complexity of these models means that it often takes some time to get to grips with them. For reasons that presumably only a psychologist could explain, when some humans learn a complex system they can develop an attachment to it. A fanatical attachment. The data model that the poor unfortunate victim has learnt becomes the “one true way”. There are no longer several data models from which to choose, there is one correct data model and all of the others are, quite clearly, evil. By this point, we aren’t simply in “blue suede shoes” territory; we have entered “work of the devil” land.

This is unfortunate because, in truth, all data models have their pros and cons. This is an inevitable side effect of the interlacing set of rules that make them up. Take the relational model. It has a number of rules, systems and features (for example, referential integrity) that allow it to provide relatively high data integrity. The downside of these rules is that, every time data is added to, or modified in, the database, the rules must be deployed to check the integrity of the data. This slows the database down. So the pro also has a con.

If we move to the multi-dimensional model of data, we can build structures that are tens of thousands of times faster to query than would be possible using comparable relational model. We gain speed - but we lose many of the data integrity rules.

Any argument about which of these data models is the ‘best’ is ultimately meaningless, unless we know the context in which it will be deployed. If query processing speed is of enormous importance to the prospective end-user of the system then that is clearly going to influence which of these two data models is ‘best’ in this particular context.

This context is equally important within the structure of any data model; take data types for example. There is a huge range of possible data types out there in the big wide world of databases – integer, binary, text, BLOB, XML, multi-valued and so on.

Asking for my opinion on, for example, the multi-valued data type (MVDT) is like asking whether I approve of dynamite or not. I don’t have a view either way until I know the context.

Mr Dy-na-mi-tee

"So, Mark, you think it is appropriate to carry dynamite in a crowded street do you?" No, of course not. The urban environment provides a very poor context for explosives. What about using it in controlled conditions to remove tree-stumps from the ground? Fine. (If you are an environmentalist who loves earwigs and feels sorry for the trauma inflicted upon them by the use of dynamite in this way, I apologise in advance.)

In other words, neither dynamite nor MVDTs inherently display any “good” or “bad” qualities whatsoever. Such attributes depend entirely upon context.

Now let’s look at MVDTs in the context of Access. Microsoft has recently announced that its PC-based database engine is to acquire this data type. This is a whole new data type for Access, one which it has so far managed to do without, throughout its 14 year life span. Access is a relational database engine. (Please, let’s not get into a discussion about whether it is truly relational or merely close to relational. It is about as relational as any other engine that purports to be relational today.)

One of the rules of the relational model is that all of the data held therein is atomic. In fact, this is one of the core rules of the relational model; by which I mean that it has a huge effect on many of the other components of the model. For example SQL, the language commonly used to manipulate relational databases, is completely based around the idea that the data it manipulates will be atomic. Unless we extend the language, it does not have the tools necessary to manipulate non-atomic data; such as that found in MVDTs.

You are entirely free to think that this is a good model for data and you are equally free to think that this is a bad model for data. Your opinion (and/or mine) does not change, by one iota, the fact that the core internal rules that govern relational databases happen to forbid multiple values in a single field.

The dialect of SQL that Access uses has no specific operators to handle MVDTs. So the introduction of MVDTs into Access will, I believe, make life more difficult for users of Access. I believe that Access developers should eschew their use, as I said here [and, in fact, we were a little surprised when some readers saw this article as a generic put down of MVDTs in any context – The Eds].

Now ask me what I think of MVDTs in the context of non-relational databases such as the PICK-related products (the brain-children of Dick Pick; there is some context here). These database products, such as IBM’s U2 (and even Intersystems Cache, which effectively divorces the logical view from the underlying physical implementation and thus supports the PICK MV model, the relational model and several others at the same time), are designed to use MVDTs and have the necessary operators to manipulate the data in them - so, no problem. MVDTs are perfectly valid in this context.

The relational model is just another data model; but the complex rules that comprise it make it a poor environment for MVDTs. Each on its own is fine; but taken together, the relational model and the MVDT form a dangerous liaison.®

Providing a secure and efficient Helpdesk

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
'People have forgotten just how late the first iPhone arrived ...'
Plus: 'Google's IDEALISM is an injudicious justification for inappropriate biz practices'
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.