Intel demos real-time code compression for die shrinkage, power saving
'Direct Compressed Execution' to boost affordability of 'Internet of Things'
Research@Intel Intel researchers have developed a way to make the increasingly tiny processors needed to power the impending "Internet of Things" even tinier: compress the code running on them.
"We compress the code, make it smaller, and save area and power of integrated on-die memory," Intel Labs senior reseacher Sergey Kochuguev from ZAO Intel A/O in St. Petersburg, Russia, told The Reg at Tuesday's Research@Intel shindig in San Francisco.
The FPGA research die that Kochuguev demoed used a dedicated hardware compression-decompression unit of a mere 20,000 gates that sits on the die between the compute core and on-die memory. The process happens dynamically in real time, and is transparent to the core.
You might reasonably ask what level of latency would be introduced in such a scheme, but Kochuguev told us that it was low enough to make the power and die-size benefits worthwhile: less than 5 per cent as measured by industry-standard EEMBC benchmarks.
The compression unit was able to shrink the code size by around one-third on average, he said, which in most implementations saves far more memory real estate than the 20,000 gates needed for the compression/decompression unit. Kochuguev also emphasized that the unit's size and cost would be constant across different implementations, and would require only three microwatts per megahertz of processing power.
Compress the code and you shrink the die required to house it and save power
Memory mapping, as you might assume, is handled by the compression/decompression unit, and the compression dictionary is hardwired into the unit, as well. "An address-resolution table is built for each binary individually," Kochuguev said, "and it sits next to the compressed-code image. We include its costs into our estimations of code-compression ratio, of course."
Kochuguev showed us a code dump before and after compression, and it was easy to see, for example, how strings of zeros in the uncompressed code were stripped out in the compressed code, significantly reducing its byte count. And when he ran the code in both uncompressed and compressed form, the execution time difference was less than 5 per cent.
The prototype that he was demonstrating was a single-core setup, but Kochuguev said that there would be no reason why the design couldn't be extended to multicore architectures, "But we didn't assess that situation."
As microcontrollers begin to appear in some of the "IoT" devices that Intel was showing off or discussing in other Research@Intel demos – intelligent drapes, coffee machines, toothbrushes, baby monitors, stereos, alarm clocks, supermarket shelves, air-quality sensors, and more – even small savings in die size and power consumption could certainly add up fast.
How fast? Intel estimates that there will be 50 billion connected devices worldwide by 2020. ®