Feeds

On data models, data types and dangerous liaisons

Context, dear boy, context

Internet Security Threat Report 2014

Comment A data model is a methodology for storing, handling and manipulating data. There are lots of them around. One of the most commonly employed at present is the relational model. Brainchild of Edgar Codd, it rapidly came into favour after he published his seminal paper in 1970. Many of the popular database engines today (for example DB2, Oracle and SQL Server) are based on the relational model.

Data models are often relatively complex entities, made up of a set of interwoven checks, constraints and rules. Any one rule, on its own, may make little sense; but the entire set, working together, should produce a useful and internally consistent system.

The complexity of these models means that it often takes some time to get to grips with them. For reasons that presumably only a psychologist could explain, when some humans learn a complex system they can develop an attachment to it. A fanatical attachment. The data model that the poor unfortunate victim has learnt becomes the “one true way”. There are no longer several data models from which to choose, there is one correct data model and all of the others are, quite clearly, evil. By this point, we aren’t simply in “blue suede shoes” territory; we have entered “work of the devil” land.

This is unfortunate because, in truth, all data models have their pros and cons. This is an inevitable side effect of the interlacing set of rules that make them up. Take the relational model. It has a number of rules, systems and features (for example, referential integrity) that allow it to provide relatively high data integrity. The downside of these rules is that, every time data is added to, or modified in, the database, the rules must be deployed to check the integrity of the data. This slows the database down. So the pro also has a con.

If we move to the multi-dimensional model of data, we can build structures that are tens of thousands of times faster to query than would be possible using comparable relational model. We gain speed - but we lose many of the data integrity rules.

Any argument about which of these data models is the ‘best’ is ultimately meaningless, unless we know the context in which it will be deployed. If query processing speed is of enormous importance to the prospective end-user of the system then that is clearly going to influence which of these two data models is ‘best’ in this particular context.

This context is equally important within the structure of any data model; take data types for example. There is a huge range of possible data types out there in the big wide world of databases – integer, binary, text, BLOB, XML, multi-valued and so on.

Asking for my opinion on, for example, the multi-valued data type (MVDT) is like asking whether I approve of dynamite or not. I don’t have a view either way until I know the context.

Mr Dy-na-mi-tee

"So, Mark, you think it is appropriate to carry dynamite in a crowded street do you?" No, of course not. The urban environment provides a very poor context for explosives. What about using it in controlled conditions to remove tree-stumps from the ground? Fine. (If you are an environmentalist who loves earwigs and feels sorry for the trauma inflicted upon them by the use of dynamite in this way, I apologise in advance.)

In other words, neither dynamite nor MVDTs inherently display any “good” or “bad” qualities whatsoever. Such attributes depend entirely upon context.

Now let’s look at MVDTs in the context of Access. Microsoft has recently announced that its PC-based database engine is to acquire this data type. This is a whole new data type for Access, one which it has so far managed to do without, throughout its 14 year life span. Access is a relational database engine. (Please, let’s not get into a discussion about whether it is truly relational or merely close to relational. It is about as relational as any other engine that purports to be relational today.)

One of the rules of the relational model is that all of the data held therein is atomic. In fact, this is one of the core rules of the relational model; by which I mean that it has a huge effect on many of the other components of the model. For example SQL, the language commonly used to manipulate relational databases, is completely based around the idea that the data it manipulates will be atomic. Unless we extend the language, it does not have the tools necessary to manipulate non-atomic data; such as that found in MVDTs.

You are entirely free to think that this is a good model for data and you are equally free to think that this is a bad model for data. Your opinion (and/or mine) does not change, by one iota, the fact that the core internal rules that govern relational databases happen to forbid multiple values in a single field.

The dialect of SQL that Access uses has no specific operators to handle MVDTs. So the introduction of MVDTs into Access will, I believe, make life more difficult for users of Access. I believe that Access developers should eschew their use, as I said here [and, in fact, we were a little surprised when some readers saw this article as a generic put down of MVDTs in any context – The Eds].

Now ask me what I think of MVDTs in the context of non-relational databases such as the PICK-related products (the brain-children of Dick Pick; there is some context here). These database products, such as IBM’s U2 (and even Intersystems Cache, which effectively divorces the logical view from the underlying physical implementation and thus supports the PICK MV model, the relational model and several others at the same time), are designed to use MVDTs and have the necessary operators to manipulate the data in them - so, no problem. MVDTs are perfectly valid in this context.

The relational model is just another data model; but the complex rules that comprise it make it a poor environment for MVDTs. Each on its own is fine; but taken together, the relational model and the MVDT form a dangerous liaison.®

Security for virtualized datacentres

More from The Register

next story
Microsoft WINDOWS 10: Seven ATE Nine. Or Eight did really
Windows NEIN skipped, tech preview due out on Wednesday
Business is back, baby! Hasta la VISTA, Win 8... Oh, yeah, Windows 9
Forget touchscreen millennials, Microsoft goes for mouse crowd
Apple: SO sorry for the iOS 8.0.1 UPDATE BUNGLE HORROR
Apple kills 'upgrade'. Hey, Microsoft. You sure you want to be like these guys?
ARM gives Internet of Things a piece of its mind – the Cortex-M7
32-bit core packs some DSP for VIP IoT CPU LOL
Microsoft on the Threshold of a new name for Windows next week
Rebranded OS reportedly set to be flung open by Redmond
Lotus Notes inventor Ozzie invents app to talk to people on your phone
Imagine that. Startup floats with voice collab app for Win iPhone
'Google is NOT the gatekeeper to the web, as some claim'
Plus: 'Pretty sure iOS 8.0.2 will just turn the iPhone into a fax machine'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.