David May, parallel processing pioneer
We salute the architect of the Transputer
Unsung Heroes of Tech "It's very distressing - I'm watching almost with disbelief. The Americans cannot get it out of their heads that if you're trying to build machines with lots of processors, you don't assume that they all share a common memory. The world doesn't have a common database. We pass messages to one another."
David May, professor of computer science at the University of Bristol, is talking about the current trend in chip design that proliferates cores - Intel's 'Knights Corner' currently runs to 50 processors on a single chip - but has them all dipping into the same memory pool.
David May today
"The memory is a bottleneck even in a well-designed single-processor system," he says. May should know. He was the chief architect of the UK's famous foray into parallel computing way back in the 1980s - the Inmos Transputer.
Inmos was a government-funded IT company founded in 1978. Its mission was the implementation of a radically new microprocessor architecture: a complete computer-on-a-chip, comprising a fast, low-complexity CPU, substantial local memory and agile input-output circuitry to enable it to communicate efficiently with other transputers.
The key idea was to create a component that could be scaled from use as a single embedded chip in dedicated devices like a TV set-top box, all the way up to a vast supercomputer built from a huge array of interconnected Transputers.
Connect them up and you had, what was, for its era, a hugely powerful system, able to render Mandelbrot Set images and even do ray tracing in real time - a complex computing task only now coming into the reach of the latest GPUs, but solved by British boffins 30-odd years ago.
Cutting the cord
To understand where May is coming from, we need to wind back to the Warwick University robotics labs of the early 1970s. A machine with articulated arms and a Cyclops eye trundles across the floor towards a plastic cup on the lab bench. In the corner of the room, a lineprinter plugged into a DEC PDP-11 raps out the word "CUP". The robot reaches out towards the cup and tries to pick it up.
But May isn't looking at the robot or the cup. He's focused on the thick umbilical that connects the robot to the minicomputer. It's a big, big bundle of cables, a pair for each of the sensors in the robot feeding data back to the PDP-11, and a pair for each of the robot's actuators. All those wires are telling May the computer's in the wrong place - it should be inside the robot.
On-die memory: the Transputer had it 30-odd years before it became fashionable
Instantly, it strikes him that that's wrong too. You don't want one big PDP-11 in the robot, you want lots of tiny PDP-11s, one for each of the sensors and actuators. And a way of networking them all together.
Next page: Shrinking the minicomputer
Amen. Message passing is where it's at.
As someone who has to constantly deal with multithreading hell, I fully agree. Debugging other people's synchronization errors, mutex and semaphore issues, performance cratering due to shared access issues, ararararargh. Replace it all with message queues and it's so much more deterministic, and faster as well. The queues have to be synced and high performance, but that's a single point to optimize. You can still share huge areas of memory (if you do have shared memory) by passing pointers but using the messages as permission to access and doing that infrequently.
Something with super-fast efficient message passing like Barrelfish just makes me weak in the knees - not worrying about cache coherency is a great thing. So c'mon, don't say 'The Americans' when you just mean Intel. Even Microsoft realizes Intel is dogging the wrong bone again, like they did with the P4 before the Israelis showed them the right way.
We thought it was a vision of the future
In the early '80s I ran a series of symposia for IBM UK "thought leaders" at Cambridge University. The theme of the first one was "Change" and we invited Inmos along to demonstrate the transputer technology, which some of us thought IBM should invest in.
At around that time IBM was proud to have produced a complex ray-traced image of a Newton's Cradle sitting on a chess board. This was done in under 24 hours on a high end general purpose mainframe.
Inmos demonstrated the same image before our eyes in minutes using a couple of shoebox-sized pieces of hardware. They also demonstrated how the performance could be increased by simply adding more transputers without powering off the machine.
Following the demonstration the group discussion decided that it was obviously done with smoke and mirrors and had no commercial value!
I learned a lot from that presentation, mainly to make sure that change was introduced in small increments. This was proof of Clark's 3rd Law - "Any sufficiently advanced technology is indistinguishable from magic."
I ought to point out that the Occam language is thriving beyond Bristol. A group at Allegheny College in the USA (see http://transterpreter.org/) have ported an Occam dialect to the Atmel's Atmega328 microcontroller chip and appear to have generated a lot of interest among users of this tiny low-cost device - who include robot makers, animatronic artists and hobbyists. And I keep nudging its name as often as I can get away with it in PC Pro (see for example http://www.pcpro.co.uk/features/357853/how-to-build-a-computer-smarter-than-a-us-president)
Parallel processing is like Christianity
It has not been tried and found wanting, it has been found difficult and left untried (as Chesterton put it). Now we're looking at dozens of cores on a chip, but can we make effective use of them? On a server supporting multiple independent sessions (and probably with some VM going on as well) we can - but in a desktop or a supercomputer dedicated to one task?
Get real Inmos was a disaster zone
Why do we remeber britains technical and commercial failures through rose tinted glasses?
I was a system design engineer when the transputer was launched. It was nowehere near competive. It was targeted at number crunching tasks but teh performanc eof the individual pressors was so poor we calculated we would need more than 200 to replace the single bit slice procesor we had at the time. DSP devices were starting to appear at the time and they made a lot more sense. Developing a distributed message passing algorithm is much harder than on more conventional platforms so we estimated a 5 times increase in development effort.
Selling a product in which the power consumptio, space and cost are two orders of magnitude greater than competing technologies and the development costs are one order of magnitude greater is not smart.
We did use large quantities of Inmos DRAM becaus ethe nibble mod eparts were at the time the fastest available however we quickly discovered a design flaw that resulte din patern sensitive corruption of the memory. Inmos denied thsi for more than 1 year despite us being one of their largest customers and the fact we could reliably produce a problem right from the start on approximately 1% of all devices.
Poor customer service
No mystery about why the company no longer exists.
Lets produce commercially viable technologically innovative products with high qualty not fantasise about products thatwere badly conceived and poorl;y executed.