Original URL: http://www.theregister.co.uk/2013/08/19/ted_codd_90_relational_daddy/
12 simple rules: How Ted Codd transformed the humble database
Near misses and lucky escapes for a multi-billion-dollar baby
Posted in Applications, 19th August 2013 08:32 GMT
Anniversary Edgar – or Ted – Codd is one of the most influential figures in computing. Born 90 years today*, Codd – who passed away in 2003 – was the man who first conceived of the relational model for database management.

Relational databases are today ubiquitous – on your PC, in your smartphone, in your bank’s ATMs, inside airline reservation systems – so it’s easy to be blasé about his contribution, but Codd worked in a different world.
You needed to be a programming nerd or rocket scientist with a theoretical background to build even a functioning database.
Codd’s contribution was revolutionary: first, it separated the data from the computing and from the application, and second, it described a framework for storing and retrieving data using simple rows and tables.
It was a model everybody could buy into – and they did.
Codd’s idea was to databases as the GUI from Xerox Parc was to PCs; relational brought databases to the masses. Decades later, analyst IDC pegs the worth of the global relational database market at about $28bn – and it's growing at 7.6 per cent a year. Some of the biggest and most profitable names on the computing scene – Oracle, IBM and Microsoft – are currently working on relational database management systems.
The breakthrough earned Codd a Turing Award.
It’s easy to view success through the present. This happens especially in technology, where fakers look down the lens of the past to pronounce the inevitable brilliance of somebody’s achievements today.
But there was nothing inevitable about relational. In fact, this is one technology that was born as it lives today – on the cusp of uncertainty.
Future users of large data banks
It does seem fitting that somebody like Codd should have invented a process for simplifying databases. An Oxford-educated mathematician, Codd was no academic theoretician or government stuffed shirt of the type who were building many of the room-sized computers at the time.
Codd was an outsider. Born in Poole, Dorset, in 1923, Codd served in the Royal Air Force (Coastal Command) and moved to the United States after World War II. Once there, despite his obvious talents in numbers and flying, he worked as a sales clerk at US retailer Macy’s before going on to teach mathematics at the University of Tennessee – not exactly an Ivy League destination.
His career in computing didn’t begin until he was 26, in 1949, when Codd joined IBM as a programming mathematician. At Big Blue, He worked on the Selective Sequence Electronic Calculator and then IBM’s card-programmed electronic calculator.
Relational happened next, right? Wrong. Personally offended by US senator Joseph McCarthy’s Cold-War Communist-baiting, Codd abandoned IBM and the US entirely in 1953 and went to work across the border in Canada. He only rejoined IBM after running into an ex-colleague four years later.
Having transferred to IBM Research facility in San Jose, California, Codd was on track to work on databases. In 1970, Codd published the paper that changed history, A Relational Model of Data for Large Shared Data Banks (PDF). In a paper which famously opened: "Future users of large data banks must be protected from having to know how the data is organised in the machine", he proposed substituting the hierarchical or navigational structures used to build databases with tables of rows and columns.
It was simple and sensational. But IBM wouldn't bite.
At the time, databases fell in to two camps: IBM's IMS (Information Management System) used a hierarchical system of associating related types under a top-level identifier. So, if you were a bank, for example, this identifier might be a person’s name with all related data – address, children, other accounts, and address – hanging off, underneath. CODASYL was the second approach, and used a navigational database model which allowed you to define your database’s schema and its language.
Imagine your data, frozen
The problems were clear: neither scaled. Data was hardwired into either a top-down or a language or schema silo chosen by the database builder.
There were no cross-industry standards for query, never mind for third-party tools, and data portability was near impossible unless both data stores shared the same database structure. In order to query data, you required routines written by humans to very specific functions.
If you were working with databases, you also needed to know different languages and structures and were pretty much married to the individual or company that had built your database.
Rows and tables were simple. They were agnostic about type and everybody understood the principle.
The final piece of the relational puzzle slotted into place a year or two later, when IBMers Donald Chamberlin and Raymond Boyce followed up with SEQUEL (Structured English Query Language), which later became SQL, giving physical form to the relational theory.
As far as IBM was concerned, however, relational was DOA. The mainframe giant was too heavily invested in IMS and wasn’t about to cannibalise the business. It wasn’t until a decade later, when IBM released SQL/DS and DB2 in 1981 and 1983, that Codd’s employer got into the relational game. By 1985, Codd had outlined his 12 rules for defining a fully relational database.

Ted Codd Photo courtesy: IBM
Stonebraker and Ellison go after IBM's dropped ball
Fortunately for us, others weren’t so flat-footed and relational was snatched from the jaws of defeat by those outside of IBM.
The two people who saw relational's potential were a Berkeley comp sci assistant professor and later serial entrepreneur called Michael Stonebraker – who built Ingres – and an obscure salesman named Larry Ellison, who first had a crack at RDBs at Software Development Laboratories (which later became Relational Software Inc and then Oracle). Both saw the opportunity, and they jumped in with their own relational database products.
The glory years were the 1980s and 1990s, when data was simple and structured, and the market was inflated by the explosion in the personal computer. On the back of PCs there were servers and storage, and with the creation of data came need to cleanse, analyse and understand that data – it was the time of business intelligence and of the data warehouse.
SQL was ratified as a vendor-neutral standard with the American National Standards Institute (ANSI) in 1986 and six years later saw publication of what’s considered the breakthrough SQL spec, which managed to fill a number of gaps.
Relational blossomed as it sucked on the data of the time: sales numbers, customer stats – text-based information. Relational imposed a rigid structure, but within it was the freedom and fluidity that everybody needed.
As the 2000s approached, relational faced its next challenge: unstructured and non-text-based data – audio, video, graphics – plus the desire of corporate folk to quickly retrieve, serve and understand that data on a huge scale.
SQL was judged too slow thanks to its centralised structure, and a new movement gave birth to NoSQL – databases that use a key value to describe an object and which are therefore known as key-value stores.
Key-value stores are capable of recognising non-text data – music, video, graphics and so on. These stores include Cassandra, Cloudant, CouchDB, SimpleDB and Google AppEngine DataStore.
The use of NoSQL has become widespread thanks to the rise of users like Google, Amazon clouds, Facebook, LinkedIn and Twitter, which serve and store huge amounts of unstructured data using distributed servers.
On the eve of the anniversary of Codd's birth, NoSQL advocates will gather not far from where Codd worked in San Jose, California, for NoSQL Now!.
But for all its advantages there are problems.
There’s no common NoSQL standard: you build on a database-by-database basis. The databases have either been developed by big companies or by database and programming whizzes and are still growing up in terms of their ease of use and management for ordinary end users.
Also, having eschewed SQL, many of first-generation NoSQL data stores voluntarily surrendered the ACID properties that are the hallmarks of relational and made it so suited to business.
But later NoSQL startups have re-embraced the faith.
The relational database providers, meanwhile, have responded. They've sped up their databases employing ideas such as memory processing using caching instead of disks for greater speed while adding support for non-relational architectures like Hadoop – the non-Google spin on BigTable.
Unnatural acts
On the 90th anniversary of Codd’s birth, his baby is big. Yet, as is characteristic with its birth, relational faces uncertainty.
Such is the health of relational that Oracle, IBM and Microsoft's are all growing – with even enterprise-resource-planning software-maker SAP now claiming its own relational business based on in-memory system Hana. And with so much relationally stored data now in existence, Codd’s technology is safe for at least until the 180th anniversary of this birth.
But change, as always, is inevitable.
As Michael Stonebraker, whose latest venture is a Big Data analytics punt with the SciDB NoSQL database, told me in 2010: “In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases.” Trying to stretch relational today, though, is an "unnatural act". ®
* Yes, we know Wikipedia says Codd was born on 23 August, 1923, but his Turning Award entry, here, says it was 19 August. The Reg checked with his former employer IBM whose archives department got in touch to say their files had 19 August. We'll go with that over Jimbo Wales... and consider it a warning Wikipedia's veracity.
