Feeds

Database in depth

A plea for a return to Relational basics

Top 5 reasons to deploy VMware with Tegile

Book review Readers have probably noticed that I was a DBA (Database Administrator) in a former life and I'm an enthusiast for the Relational Model.

However, I must confess that I've always taken my Codd re-digested by Chris Date (I have two editions of his work An Introduction to Database Systems on my shelves, both of which I actually paid for).

Now there is a new book from Date, subtitled Relational Theory for Practitioners, which complements rather than replaces his other works.

Date's premise is that "theory is practical", and I'm with Date here - if theory doesn't deliver practical benefits, you need more useful theories. But I'm also reminded of a VP in a bank who once said to me "you may be right, David, but we must be practical" about some database issue (this was the same bank that thought that buying programmers a book on the theory of testing would be a waste of money).

Anyway, Date's book explores relational theory in depth. This is important, as so many RDBMS' (Relational Database Management Systems) vendors have lost contact with the underlying theory (the book documents several examples of idiocies culled from database vendor's marketing material).

If you want to argue that multi-value "post relational" databases (they're actually pre-relational, mostly derived from Pick) supersede RDBMS then you really should read and understand this book first - much of what you hate about RDBMS may be down to the (usually poor) implementation rather than to relational theory itself.

For instance, Date maintains that storing a multi-valued set of items in an RDBMS column is perfectly OK, and that a proper relational query language should be able to process both the RDBMS as a whole and this embedded set – we have, in effect, a relational database nested in a relational database.

In other words, a set, at some level of abstraction, is atomic, just as we consider a character string to be atomic, even though it can be decomposed into characters - and, of course, an arbitrary list of unrelated items stored in a convenient bucket is still an un-normalised maintenance disaster waiting to happen.

Note that I said "proper relational query language" in that last paragraph. Date is scathing about the compromises in SQL and many RDBMS implementations (as a result, he introduces Tutorial D, a query language that supports relational theory more completely and directly).

And he objects to these compromises not just for theoretical reasons, but because, for example, allowing "duplicate tuples" limits the practical capability of the optimiser to refactor and optimise queries automatically; and the introduction of "nulls" (occasionally) results in practical query results that fail the "common sense" test.

And, did you think that the difference between a base table and a view was that the base table physically exists on disk and a view doesn't; and that joins were inevitably slow?

Date claims that relational theory makes no distinction between "views" and "tables" (neither need exist on disk) and that joins have no associated property of "speed", although a poor implementation (possibly involving the necessity for putting base tables physically on disk) might be slow. None of this will be news to that "post-ralational" RDBMS vendor Caché, of course. Caché data is all stored in sparse arrays and its "relational view" (whether or not it meets Date's stringent criteria) is, at least, a purely logical view.

Date's book is not thick and it is well and entertainingly written, but it is not easy going for all that. Readers will need some familiarity with the mathematical concepts of set theory (or, at least, they'd better not find them frightening) and they'll need to be able to think clearly and logically. They'll also need to put aside their preconceptions and think critically - Codd believed that "nulls" have a place in RDBMS (as do I, for what that's worth); Date argues cogently against this; probably, neither view is "right" in absolute terms.

This all makes the effort you'll put into reading this book worthwhile, but it isn't anything like a Wikipedi summary of RDBMS. I'm not sure I understand its implications fully on a first read but, as Mark Whitehorn put it when we discussed this book, "at least you understand that you ought to understand them".

Date provides exercises (with answers on the web here) which should make it much easier to "self-pace" your study and, unlike some of Date's other books, this really isn't just an academic textbook in my view (perhaps this is why the index isn't perfect – I found at least one reference a page out).

Finally, however, a warning. The back cover of this book promises that reading and understanding it will "set you apart and well above the competition when it comes to getting work done with a relational database". I'm afraid I seriously doubt that, for all sorts of dysfunctional reasons, which wouldn't stop me reading it.

It will make you better informed and wiser, and I'll listen more attentively when you tell me that multi-valued databases are the way to go if you've read and unstood it, but true relational support is so rare these days that knowledge of relational theory may actually be a practical hindrance in some companies.

While the superior man is worrying about relational issues, the hack will be cobbling together something which just about works, mostly and for now. Which may be all that his employer wants – until things go wrong and they wish they'd used someone who actually understood database theory (or at least someone who knew what they didn't know). As usual, "doing it right" is easiest in a mature organisation with enlightened and informed managers – and how common are these today?

Database in Depth

database in depth

Verdict: If you are a DBA or claim to understand databases, then I think you should read and digest this book. You may not agree with all the points Date makes, but you should be able to follow his arguments. And, it is actually pretty readable on the surface, and provides excercises (with answers), which will help you navigate its depths.

Author: C J Date

Publisher: O'Reilly

ISBN: 0-596-10012-4

Media: Book

List Price: £20.95

Buy this book at Register Books at Reg Developer's special discounted price! ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.