This article is more than 1 year old

Now Microsoft ports Windows 10, Linux to homegrown CPU design

MSR's E2 processor EDGEs into public view... with a little help from Qualcomm, too

Updated Microsoft has ported Windows 10 and Linux to E2, its homegrown processor architecture it has spent years working on mostly in secret.

As well as the two operating systems, the US giant's researchers say they have also ported Busybox and FreeRTOS, plus a collection of toolkits for developing and building applications for the processor: the standard C/C++ and .NET Core runtime libraries, the Windows kernel debugger, Visual C++ 2017's command line tools, and .NET's just-in-time compiler RyuJIT.

Microsoft has also ported the widely used LLVM C/C++ compiler and debugger, and related C/C++ runtime libraries. The team wanted to demonstrate that programmers do not need to rewrite their software for the experimental chipset, and that instead programs just need to be recompiled – then they are ready to roll on the new technology.

The design of E2 is a radical departure from the computer chips designed by Intel, Arm, AMD, and others. It uses an instruction set architecture known as explicit data graph execution, aka EDGE which isn't to be confused with Microsoft's Edge browser.

The Register understands from people familiar with its development that prototype E2 processors exist in the form of FPGAs – chips with reprogrammable circuitry that are typically used for electronic engineering development. For example, a dual-core implementation on Xilinx FPGAs exists, clocked at 50MHz. The team has also developed a cycle-accurate simulator capable of booting Windows and Linux, and running applications.

Qualcomm researchers were evaluating two EDGE chip designs with Microsoft: a small R0 core, and an R1 core running up to 2GHz fabricated using a 10nm process. The project, we must stress, is very much a work in progress.

Join the queue

On the outside, a typical mainstream processor appears to operate like, let's say, a garbage recycling machine with a single conveyor belt going into it: trash is fed, piece by piece, into the jaws of the iron beast to consume. Inside, there is a robot that sorts the waste into groups and sends them down their own separate conveyor belts to be processed by different parts of the machine. One belt will take plastics, another glass, another food, and so on.

Today's computer, server, and smartphone processors do the same but with software instructions. Take, for example, the Arm Cortex-A76, which splits fetched instructions into eight conveyor belts that go into the rest of the core: four for integer math operations, two for floating point math, and two for accessing data in memory. The frontend of the core tries to place instructions on the belts so there's always something being processed along each of these lanes, even if it means executing the code out of order.

You want to avoid situations where, for example, one of the integer math units is sitting there without anything to do while work is queuing in other lanes: it's a waste of resources. And if an instruction in one lane relies on the output of an instruction behind it in another lane, then processing will be held up. All these logistics are scheduled and resolved a billion times a second. The processors get the job done.

However, there may be a better way: the EDGE way, as used in the E2. It works by breaking up programs into blocks of simple instructions that can be safely executed together as atomic transactions without data dependencies holding up processing. Within each block, the code has access to its own set of private registers, avoiding having to access a global core-wide register file. The code is also annotated by the compiler to describe the flow of data through the program, allowing the CPU to schedule instruction blocks efficiently.

And, crucially, with many small execution units within a core simultaneously processing these simple blocks, many instructions can be executed at once. Rather than eight conveyor belts as in the Cortex-A76, imagine 32 or more, as is the case with the aforementioned Qualcomm R1 design. The R1 is a 32-instruction-wide out-of-order processor blueprint, and the R0 is eight-wide.

Overall, the aim of this super-RISC approach is to run software faster than rival architectures.

History

Microsoft has been quietly working on EDGE processors since roughly 2010 in its research labs. The technology started life, though, in the early 2000s at the University of Texas at Austin, in the US, as TRIPS – the Tera-op, Reliable, Intelligently adaptive Processing System. The Tera-Op refers to the goal of producing a 1 TFLOPS processor, one that could achieve one trillion floating-point math operations per second. Back then, that was a lot of speed, however, today's graphics processors and specialist hardware accelerators can run faster than that. Only now are top-end general purpose CPUs for your computer approaching or exceeding the TFLOPS barrier.

The TRIPS project managed to produce and demonstrate an ambitious prototype chip before the research effort wound down by the end of the decade. Knowhow, experience, and architectural ideas from TRIPS made their way into Microsoft's R&D labs, and were distilled into what is now the E2 project, which aims to outpace today's Intel and Arm cores using its novel design.

And now

Although E2 development has been ongoing for several years, three things happened this month that are significant. First, the team revealed Windows 10 has been ported to the architecture along with a hefty amount of support materials for application developers, allowing them to build apps for the platform. In October 2017, the researchers said they were able to get Linux booting.

Second, it emerged that US chip design giant Qualcomm was collaborating with Microsoft. Third, Microsoft's website doesn't have a lot of information about E2 – and what was online now isn't. Last week, it curiously removed this page about the work, leaving the URL to redirect to an unrelated project.

At this year's International Symposium on Computer Architecture, held this month in California, Microsoft researchers Doug Burger and Aaron Smith, and Greg Wright, Qualcomm's senior director of engineering in its processor research division, went on stage to talk about their EDGE work, and demonstrate Windows running on an E2 simulator. Burger co-led the TRIPS project, and supervised Smith in his PhD work on building software for the CPU design, at the University of Texas in Austin. Now both are at Microsoft Research.

Smith, on his LinkedIn page, noted, as a principal research manager, the extent of his E2 efforts: "I started and lead the E2 project at Microsoft Research which is investigating next-generation EDGE architectures. I grew the project from a one person team to dozens of engineers spanning multiple divisions, companies and countries."

The instruction set for E2 was finalized a couple of years ago, we're told, and is mostly secret for now. However, we do know that each block of code starts with reading in data from the global registers to temporary private registers, then processes that data, and finally writes the result back to the global registers.

Figure from MSR's EDGE FPGA paper

Example of C code compiled into EDGE instructions ... Source: Jan Gray, Aaron Smith ... Click to enlarge

Microsoft has form in designing chips: for example, the math accelerator in its HoloLens virtual reality goggles. It also works on an awful lot of private research, with some projects making their way into commercial products – such as Drawbridge into SQL Server on Linux – whereas some forever remain lab experiments.

It is speculated this E2 design may be best suited for implementing "soft" processors in FPGAs.

Spokespeople for Microsoft and Qualcomm declined to comment. ®

Updated to add

After publication, a spokeswoman for Microsoft got back to us with some extra details. "E2 is currently a research project, and there are currently no plans to productize it," she said.

"E2 has been a research project where we did a bunch of engineering to understand whether this type of architecture could actually run a real stack, and we have wound down the Qualcomm partnership since the research questions have been answered."

As for the missing webpage, she added: "Given much of the research work has wound down, we decided to take down the web page to minimize assumptions that this research would be in conflict with our existing silicon partners.

"We expect to be able to incorporate learnings from the work into our ongoing research."

Further reading and references

More about

TIP US OFF

Send us news


Other stories you might like