Happy 40th birthday, Intel 4004!

The first of the bricks that built the IT world

Change you can believe in

While the 80286 was essentially an update to the 8086, the "real change" came with the 32-bit 386, Pawlowski said.

"The beauty of it is that it went to large segments," Pawlowski said. "So instead of having the typical 64k segment architecture, they actually could go the full flat address space and go to four gigs."

As he recalls it: "The big problem we were facing with Motorola and the 68K – which was the competition at the time – was they had a flat address space and we were segmented, because that was the architecture we'd chosen to build the 8086-based architecture on."

Intel 8086

Intel 8086: 4.77MHz, 8MHz, or 10MHz; 3-micron process (click to enlarge)

Pawlowski worked on the first Multibus board built for the 386. On that board, his team added a 64K direct-mapped cache in front of the 386. "It wasn't integrated inside the part," he told us, "but it was a 16MHz clock, and so we were getting to the point where we were starting to see some of the stress points of the memory architecture – memory access patterns, which were 150 nanoseconds."

But with the 64K direct-mapped cache, "We did some pretty nifty little things," he said. "And it ran 16-bit code really well, so that was the real success."

The 386 was the chip around which Intel started building motherboards. When the 486 came along, it integrated that motherboard cache into the chip itself, and it also integrated the math coprocessor in the 486DX version. The 386 had still relied on the separate 387 chip – and, yes, there was a 386DX, but that designation had nothing to do with an on-chip FPU.

Go figure.

Intel 8088

Intel 8088: 4.77MHz or 8MHz, 3-micron process (click to enlarge)

After the 386 and the 486 came not the 586, but instead a chip that was rechristened by the Intel marketing department as the Pentium, and was built using a new microarchitecture known internally as P5.

"That became the first superscalar machine," Pawlowski told us, superscalar being the term of art that describes a processor that has more than one concurrent execution sequence, or pipeline.

"That's where we actually had multiple execution units," he said. "Not necessarily the same, but the scheduler was at least smart enough to look inside the machine, if it had to do an add, had to do a multiply, potentially some type of fetch, or some other type of instruction, it could actually look for places where it could get more locality out of the instruction, out of the machine itself."

Sponsored: Driving business with continuous operational intelligence