Multivalued datatypes considered harmful

How dangerous can a data type be?

SANS - Survey on application security programs

Increasingly developers are required to write applications that interact with database engines – typically Oracle, SQL Server, DB2, MySQL or Access. In many ways the database engine is pretty much immaterial; no matter what the flavour it’s still simply a matter of tables, columns, rows and a variety of data types; text, memo, BLOB, numeric, whatever. However if you work with Access, a completely new data type is on the horizon for 2007 – multi-valued. Unfortunately, this isn’t just-another-data-type; this is a whole different ball game and a dangerous one – more like rollerball than baseball.

As the name suggests, a multi-valued field is one in which you can place more than one value. So, imagine that you design a table to store information about, say, customers.


CustID FName LName Hobbies
1 Fred Smith Fishing, Rollerball, Hockey
2 Sally Jones Sailing
3 Brian Wilson Gliding, Sailing, Singing, Hockey

The hobbies column is a multi-valued field. In some ways this is very neat because we have created a many-to-many join (many different customers can have many different hobbies) using a single table. The alternative, and traditional way, is to use three tables.


CustID FName LName
1 Fred Smith
2 Sally Jones
3 Brian Wilson


HobbyID Hobbies
1 Fishing,
2 Rollerball
3 Hockey
4 Sailing
5 Gliding
6 Singing


CustID HobbyID
1 1
1 2
1 3
2 4
3 5
3 4
3 6
3 3

Clearly, the solution using three tables is more complex and less intuitive; so what is wrong with the multi-valued data type solution? Well, in his initial set of rules defining relational databases, Ted Codd (the originator of the relational model) forbad their use.

Rule 2, the guaranteed access rule.

Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.

If we use the table name (Customer), Primary key value (1) and column name (Hobbies) we don’t get a single atomic value (such as ‘Fishing’); we get multiple pieces of data (Fishing, Rollerball, Hockey).

Now this is a killer argument if you are a database freak like me (“If Ted Codd forbad it, I want no further truck with your multi-valued data types.”) but I quite understand that, if you are an application developer, the finer points of relational database theory often sound like just so much academic nonsense. If a new feature makes life easier, who cares if it happens to break some arbitrary rule written over 20 years ago?

Fair enough. So let’s look at an intensely practical reason why multi-valued fields are so bad. We query databases using SQL. The design of SQL is based entirely on the assumption that each column contains atomic values. If we run a normal SQL query against our single table solution:

WHERE Hobby = “Rollerball”

It will return zero rows; despite the fact that one of our customers plays rollerball, because there is no row with a field just containing “rollerball”.

If you want a challenge, try to construct the SQL necessary to find the names of the customers who both glide and sail. It is, of course, possible, but the solution is more complex than extracting the same information from the three table structure.

And if you are not convinced that we are plunging into deep water here, imagine that I store foreign key values (in the example shown, into a PRODUCT table) in a multi valued field:


OrderID ODate Products
1 1/1/07 1,5,5,5,5,5,67,434,434,5654
2 1/1/07 45,67,454,454,454,65556
3 2/1/07 2,454,5677

3 Big data security analytics techniques

More from The Register

next story
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
Patch iOS, OS X now: PDFs, JPEGs, URLs, web pages can pwn your kit
Plus: iThings and desktops at risk of NEW SSL attack flaw
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.