DBMS pioneer Bachman: 'Engineers have more fun than academics'

Original URL: https://www.theregister.com/2011/11/30/charlie_bachman_interview/

Very large databases are so '70s

Posted in Databases, 30th November 2011 17:41 GMT

Fifty years ago this month a young engineer at mega corp General Electric was on the verge of completing a project that would change technology.

Charlie Bachman was working on Integrated Data Store (IDS), the first disk-based database management system (DBMS) that could be accessed by apps simultaneously. IDS made data independent of applications.

IDS came to be acknowledged as the world's first "proper" database and evolved to become one of the most important components of large data processing systems. The network model IDS employed, meanwhile, is still operational in more than a thousand applications worldwide.

BT, for example, processes 275 million transactions a day using the most successful descendant of IDS – CA-IDMS.

By 1973, when IBM's Ted Codd was writing his first paper on relational technology, Bachman had been awarded the prestigious Turing Award for his work on DBMS and he also has the distinction of being the senior Distinguished Fellow of the British Computing Society (BCS).

What makes this even more remarkable is the fact Bachman has dyslexia.

Talking to The Reg this month, Bachman reckons his dyslexia may have worked to his advantage: "I found reading harder than writing so I've always been in a situation where I was writing the forward-looking article because I didn't know what others were doing."

"My career just kind of happened. Rather than taking deliberate steps I just followed the flow" – Charlie Bachman

He did point out, however: "If you are not careful this can catch up with you because it makes it hard to keep up well with what is going on in parallel."

Fifty years after that initial breakthrough, Bachman is still working – consulting on DBMS, writing a book on data modelling and helping edit a biography by Thomas Haigh of the Charles Babbage Institute.

Unusually for someone who has contributed so many ideas to the development of computing, Bachman says he has never been an "academic". Instead, Bachman preferred a career as a practising engineer in a commercial environment.

"I think engineers have more fun than academics," he tells us. "The next project is always different – a fresh challenge – not like teaching the same thing to a new batch of students every year.

"My career just kind of happened. Rather than taking deliberate steps I just followed the flow. I was a good student at school although not necessarily the best. But I consider myself as a late developer because I was dyslexic."

Born in 1924, Bachman's first experience of computers came with the US Army in the Pacific in WWII. His Computer History Museum bio says he used the fire control computers to aim 90mm anti-aircraft guns. After the war, Bachman earned a masters degree in mechanical engineering and, in 1950, he joined Dow Chemical as an engineer. The first seeds of what would become IDS were sown there.

"I suppose it started when I went to work at Dow Chemical. One of my first assignments was to evaluate two valves for a chemical process we had. One valve cost $5,000 more than the other – but it took less power to operate and cost less to maintain.

Turning the expense valve

"So, in the long run, the expensive valve worked out to be more economical. I came up with the idea of equivalent capital value to help figure out what the return on investment could be. We went on to work out sets of tables that engineers could use to work out the equivalent capital value."

The tables data was punched into cards to create a primitive database, which could be used for different analyses.

When Bachman moved on to General Electric in 1960, the idea of data being independent of applications had begun to form and, as part of GE's new manufacturing control system (MIACS), IDS began to take shape.

Something borrowed, something new

Bachman describes IDS as "an assemblage of a number of different ideas". Some of these ideas came from abstract concepts such as data hierarchies, data descriptions and randomised addressing. Other ideas grew from earlier attempts to analyse data such as the 702 Report Generator developed for the IBM 702 and 9PAC that was built for the 702's successor, the IBM 709. Bachman became involved with work to develop the 9PAC reporting system through the IBM SHARE user group while still at Dow.

He recalls the way that SHARE worked then was comparable in some ways to today's open source development. Participants gave their time and ideas voluntarily and swapped code – as long as you had several million dollars to buy the machine to run it on.

"DBMS is a leap back to a 100-year-old technology – it tracks back to what we used to do with punched cards" – Charlie Bachman

But it was not only novel software ideas that made IDS possible. Developments in storage technology also played their part and the big enabler was the emergence of "affordable" and reliable disk storage devices. Disk stores had been around since the mid-1950s, but it was not until the early 1960s that it became practical to use them in commercial applications.

Building the programs needed to operate IDS turned out to be a real challenge for Bachman. He had only written a single program previously.

"When I came to develop IDS, I had only one program behind me. We started out using a language called GECOM, which was GE's version of what later became Cobol. For two years we were dealing with GECOM although there was a feature that allowed you to enter the assembly language: GEPLAN."

Even before the MIACS project was completed, Bachman realised that IDS would be important: "MIACS did not go into production until 1965 – but we already had two IDS sites running in 1964 and saw we could put it into production."

Database software derived from IDS's network model continued to grow until the 1980s, when the faster hardware enabled the relational model developed by IBM's Ted Codd to take the lead.

Bachman, though, remains skeptical about the use of relational database in large-scale transaction processing systems: "There is still a large discussion here in my mind. Ideas have a great deal of momentum. Big systems are very difficult to replace with 'new' systems. Otherwise why do a thousand IDMS (IDS) systems still run some very large IBM mainframes around the world?"

It's a hot topic. Others, particularly supporters of NoSQL, would argue RDMS is not suited to large systems. Today those large systems are not mainframes, they are data centers running companies' clouds. When it comes to so-called very-large databases today, Bachman is fairly dismissive and brief. "Oh that was all happening in the 1970s," he tells us.

Passion in the blood

Bachman's later work has taken him into open systems and computer-aided software engineering (CASE) tools, but his passion for database systems has continued. Most recently he has worked with the Cord Blood Registry (CBR) in California to develop special database software for stem cell research.

Bachman's papers and correspondence are archived at the Charles Babbage Institute.

And with his birthday fast approaching next month, Bachman is still thinking outside the box, coming up with fresh insights on databases.

"While working on my data-modelling book I had an insight into what relational DBMS is really all about. Thinking back to the old card-based 702 report writer it was all about flattened files - where all codes are in every record. If you do a relational JOIN you do just that - you flatten the file.

"So," he concludes, "it struck me that relational DBMS is a leap back to a 100-year-old technology – it tracks back to what we used to do with punched cards – aside from SQL as a language to control what is going on." ®