Codd almighty! How IBM cracked System R

The tech team that made relational databases a reality

Reducing security risks from open source software

Few teams have maintained such a fierce community spirit as the IBM pioneers of System R. In Silicon Valley in the early 1970s, this pioneering team proved that a relational database could actually work.

It's a fascinating story, best known today because IBM failed to capitalise on its research. But it's also a timeless one, since the team had to put chalkboard theories into real working code, in the face of widespread skepticism that a computer could actually outperform a human. And that's an engineering challenge that lives on today.

The story goes like this.

Databases had evolved in response to developments in storage - but not always very quickly. The first databases were implemented on punch cards, with a sequential file technology, before the term existed.

Yet when interactive storage came along, the same sequential model followed to the new medium. The newfangled magnetic disks and drums gave programmers fast interactive storage, which in turn, should have permitted more sophisticated interactive queries. But what users got was really "the same but quicker".

By 1970, the leading commercial database of its time was a hierarchical one: IBM's IMS (Information Management System) had been developed to track inventory for the Apollo moon shot. Yet ground-breaking systems like the mainframe timesharing OS Multics were introducing concepts such as hierarchical file systems and dynamic linking - and showing the way for how sophisticated computers would soon become. Could databases keep up?

In the hierarchical IMS world, pieces of data were linked in a tree-like structure. It's still a success today. The relationships between items of data were easy to draw - but laborious to maintain. These hierarchical databases had other limitations. In a hierarchical model, the location of the data had to be known, the physical links had to be maintained, and making changes to the database were daunting. Queries were crude. So despite the advent of "direct action drives", manipulating the data appeared anything but direct.

As ever in computer science, it was widely appreciated that creating a new level of abstraction might cost something in terms of efficiency, but yield potentially huge payoffs in ease of use and the applications for which data could be put. There were two contenders in the research community. But which one would prevail?

"There were two camps, and they were warring with each other. Each camp couldn’t understand what the other was talking about. They had completely different assumptions about what was important," recalled Jim Gray, then a Berkeley PhD who had joined IBM research.

One approach was the network model, introduced to the world in 1969 by Charles Bachman, who had developed one of the first DBMS for General Electric in 1960. Bachman, who will be 90 this year, contributed to many areas of database work including multiuser and programming and in his sixties created the first CASE tools. Bachman described the programmer as a navigator (ref: ACM Turing Award Lecture 1973). In Bachman's navigational or "network" model, records had two-way pointers.

"We have spent the last 50 years with Ptolemaic information systems. These systems, and most of the thinking about systems, were based on a 'computer centered' concept," said Bachman, accepting the ACM's Turing Award in 1973 - you can read his speech here, it gives a great insight into the time. And check out a 2011 interview with The Register, in which he reflects on his career.

Another approach was even more ambitious, and it was advanced by an expat British mathematician and wartime RAF pilot, Ted Codd. Codd's approach was rather different, favouring a declarative model. This meant the programmer "declared" relationships and the computer would be expected to implement them in bits and bytes. Nothing below that should concern the user.

Codd had developed two languages – a relational algebra and a relational calculus – to express extremely complex queries.

"Codd who had some kind of strange mathematical notation, but nobody took it very seriously," mused Don Chamberlin, in our 2003 Reg obituary of Codd. He would become a research colleague of Codd's, and co-inventor of SQL.

It seemed unthinkable that IBM, which then pretty much was the commercial data processing industry, and which had invented so much direct access storage, would be deaf to the idea. And it wasn't.

A 1962 postcard of IBM's San Jose research lab, home of the disk drive and the relational database amongst other inventions. The researchers moved to a 700-acre site in the hills at Almaden in 1986. Much of Cottle Road was destroyed in a suspicious fire in 2008.

Originally data was just stuff that belonged to an application, and even though the commercial database was an IBM success, a little of that was reflected at mighty IBM, too. In 1973 Big Blue decided to do something about this, and consolidated its database research in San Jose, California. IBM gave its research staff beautiful buildings in serene locations - and 5600 Cottle Road, San Jose was in keeping with tradition. Codd had joined IBM's research labs in 1970, and the move brought him into contact to some clever engineers.

"Codd gave a speech [to us] where he said, 'Sure, you can express a question by writing a navigational plan. But if you think about it, the navigational plan isn’t really the essence of what you are trying to accomplish. You are trying to find the answer to some question, and you are expressing the question as a plan for navigating. Wouldn’t it be nice if you could express the question at a higher level and let the system figure out how to do the navigation?’" recalled Chamberlin in a 2001 interview.

"That was Codd’s basic message. He said, ‘If you raise the level of the language that you use to ask questions to a higher and less procedural level, then your question becomes independent of the plan."

Codd only makes one citation in his 1970 paper, referencing work by David Childs on set theory published in 1968. One of the unsung heroes of the story, Childs had been at US defence research lab ARPA in 1965, and at the time ARPA wanted to think about treating data mathematically.

"Childs' 1968 papers and Codd's 1970 paper discussed structure (independent sets, no fixed structure, access by name instead of by pointers) and operations (union, restriction, etc). Childs' papers included benchmark times for doing set operations on an IBM 7090. Codd's 1970 paper introduced normal forms, and his subsequent papers introduced the integrity rules," notes author Ken North.

Access to Childs' 1968 paper was restricted, however. Childs himself would found a successful database company in 1970, with a former president of Chrysler, which was bought by Hitachi in 1984.

Codd got to work with a core team that soon included Gray.

Codd's ideas were considered outré by many

"Ted’s work was mainly of academic interest I would say,” Don Chamberlin reflected later. "It was considered to be a little bit out of the mainstream, somewhat mathematical and theoretical in its flavour."

The industry was developing some standards through a consortium referred to as the CODASYL model - also called the DBTG (Database Task Group) - to create a standard database language.

Chamberlin added: "CODASYL was based on a network data model. It was a little bit more general than the hierarchic data model because it didn’t have the constraint that data had to be organized into a hierarchy—the records could be organised in whatever way you like."

This was fine in theory, but threw a massive research problem at the engineers to crack a working implementation. So IBM created a project with a dozen PhDs: System R (R for Relational). This would prove a relational database was possible, and "not only possible but efficient".

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.