UK micro pioneer Chris Shelton: The mind behind the Nascom 1
...and Clive Sinclair's PgC Intel-beating wonder chip
Sir Clive Sinclair and the PgC7000
Basing the machine on the Transputer would require a host of other chips to go with it, all adding to the cost. Shelton recalls: “We got to the point where we found ourselves asking, why are we putting together a quick video chip to go with this? Why don’t we just have our own processor in the video chip and then we don’t actually need the Transputer? And maybe, if it’s fast enough, we don’t even need the video chip, which is how it turned out.”
Other issues with the Transputer approach were emerging. By 1987, Thorn EMI had acquired Inmos, taken a beating for its trouble, and now wanted rid of the ailing chip maker. And the chip was no longer able to emulate the latest PC technology: by now a 12MHz IBM PC AT. If Sir Clive was to realise his vision of a universal office workstation, he needed a new chip. And, because Sinclair wanted the computer to be as inexpensive to make as possible, the processor would have to be able to double-up as the graphics controller and the laser printer engine too.
In 1988, Shelton and a small team of engineers set out to create just such a chip. He was convinced that a RISC architecture was the way to go in order to create a small chip with as few logic gates as possible: “If you can go fast enough, you can run standard code at standard speed. But by making it an ultra-RISC device, not only will it go faster but it will also be smaller and therefore cheap.”
But he also saw that while a RISC core might well become sufficiently fast to be capable of handling the emulation, it would be hindered by its simplicity and memory access speed. For instance, the CPU would be idle for too long for Shelton's liking, waiting for the next instruction or piece of data to come in from memory. This was before the emergence of out-of-order processors able to re-order their instruction flows according to the availability of data - though at the cost of the highly complex circuitry required to assess which, if any, instructions can be pulled out of the queue early. Of course, implementing out-of-order execution would have required many, many more logic gates and made the chip too large and thus too costly for Sinclair.
Shelton wanted to clock his chip to deliver 200 MIPS of performance, a speed existing chips couldn’t then achieve because of their memory IO limitations. The Transputer could manage 20 MIPS. The upcoming Intel 486, clocked at 50MHz, could only run at between 40 and 50 MIPS. RISC chips generally had higher MIPS ratings but not of the order Shelton wanted.
His solution to the memory bandwidth issue: allow the processor, an “ultra-RISC” core with just a few hard-wired single-byte instructions, to operate at literally whatever clock speed a given chip’s thermal characteristics would allow - essentially automatic over-clocking - and leave the job of keeping it fed with instructions to separate interface controllers built into the chip.
The core had no external clock, or indeed any direct links with outside world. It took instructions and data from an on-die 32-instruction 2.5ns static RAM cache - called the Q Cache - arranged as a simple, circular buffer and kept stocked by the on-chip memory controller using fast RAM-to-RAM data transfers. Other on-die controllers sat between the core and the host system, to manage an I²C bus and to handle interrupts, for instance. The chip had its own video RAM cache.
“It was asynchronous. All the peripheral units were clocked because you have to meet external specs to attach anything to [the chip] but the core was asynchronous and ran at the speed of the day,” he says. Between any two external memory cycles, the core might be able to execute hundreds or even thousands of instructions if they were tightly looped and operating with register-stored data.
The chip had an expandable selection of 40 32-bit registers arranged in five banks named 0, 1, 2, 3 and TOP. When the chip received an interrupt, processor state information was saved to TOP, allowing the register bank to be expanded in future versions of the chip without breaking software compatibility.
On-die RAM, on-die ROM
Shelton’s processor also had its own software on board: a 768-byte ROM packed with hard-coded sub-routines used to implement CISC-style features, all directly accessible by the core by way of an Instruction Fetch Unit able to pull instructions from the cache or the ROM, which also contained the processor boot code. A third of the chip’s area was taken up by the RAM and ROM banks.
Shelton and his team worked in some tricks. The memory controller could support paging and memory access could be accelerated by, where possible, changing only a RAM chip’s column address rather than the row address during fetches; back then it took three times longer to select a row than accessing a column. The upshot: near serial communication with memory and thus fast enough to keep the core’s buffer stocked. If the core ever found itself without instructions, or the cache invalid, it would doze while waiting for an interrupt to signal a freshly filled cache. It could start executing instructions before the Q Cache was full again. Any loss in performance from the core taking a nap would be more than compensated for by the speed with which it tackled the newly cached instructions.
Shelton also chose to implement the chip using the old bipolar rather than the modern CMOS fabrication process because it yielded faster transistors and required fewer lithography masks. The team ran the chip at just 0.25V to deal with bipolar’s otherwise higher-than-CMOS power consumption.
By July 1989, the processor was running in simulation. The following March, the first lithography masks were completed. During 1990, Shelton and his engineers struggled with a flaw in the lithography process and, later, with a bug that emerged in the memory interface, all exposed during the testing of silicon samples fabricated by Ferranti aka GEC Plessey, which had never used its bipolar Collector Diffusion Isolation process for silicon as complex even as Shelton’s before. The team abandoned the original plan to incorporate a dedicated graphics controller and instead built in a dedicated interrupt to allow the chip, now known as the PgC7000 in its prototype form, to operate as a simple CRT controller. Running a 1024 x 768, 8-bit video buffer consumed just one per cent of the CPU time.
Sponsored: Magic Quadrant for Client Management Tools