It's all in the RISC: Arm legs it to Computex with a head full of Cortex-A77 CPU, Mali-G77 GPUs

That's enough body language puns

Someone taking a photo on a smartphone in the countryside
Arm's suggested illustration for its Cortex-A77 news, proving once and for all that people take photos with their phones

Chip design factory Arm is rolling another CPU core off the assembly language line: the Cortex-A77. It'll probably be the brains of high-end smartphones, modest slab-tops, and other devices shipping early next year.

In time for the Computex 2019 industry jamboree in Taiwan this coming week, Arm on Sunday spoke publicly about the A77, a follow-on from last year's Cortex-A76. As such, a single-core A77 is estimated by Arm to have up to 20 per cent higher IPC (instructions per cycle) performance over an A76, when running both at 3GHz on 7nm.

The A77 has, as far as we can tell, no scary surprises. It is an Armv8.2-compatible CPU core capable of running 32-bit and 64-bit application code, with 64KB L1 instruction and data caches, 256 or 512KB L2 caches, and up to 4MB of L3 cache. You can combine up to four of them with four smaller cores, such as the Cortex-A55, into a big.LITTLE arrangement: the A55s doing light tasks until the A77s spin up to run heavier code.

According to Arm's technical docs seen by The Register ahead of today's launch, the A77 has double the A76's memory bandwidth to its branch predictor (64 bytes per cycle), sports improvements to its branch predictor's accuracy, has a 33 per cent larger main branch target buffer (8K entries), and has a four-times larger L1-BTB (64 entries, one-cycle latency).

The front end also features a 1,500-entry macro-operation cache, which can be considered an L0 decoded instruction cache, again increasing performance. The dispatch bandwidth is increased 50 per cent to six instructions per cycle, with a 160-entry out-of-order execution window, up 25 per cent on the previous generation. Integer execution bandwidth is increased 50 per cent, and there's now a second lane for performing AES cryptography.

On the subject of the macro-op cache, it's useful for speeding up instructions that can be decoded and broken into separate operations that are subsequently cached. "A common example is load instructions with an immediate pre-/post-index, where the base address register is also updated," a spokesperson for Arm's engineering team told us. "This instruction is cracked into a load and an 'update' macro-op."

Here's a diagram summarizing the A77:

Arm's overview of the Cortex-A77

Click to enlarge ... Source: Arm

Arm also tore the wraps off its super-scalar Mali-G77 graphics processor design, which features its new Valhall architecture, and its accompanying Mali-D77 display processor and improved neural-network processor unit, all of which, like the A77, are available to license and use in forthcoming system-on-chips. ®

Speaking of licensed designs... PowerVR GPU and neural network accelerator cores are now available to license via SiFive's DesignShare platform, allowing folks creating their own RISC-V system-on-chips to include Imagination's accelerators. SiFive is a developer of customizable RISC-V SoCs and processor blueprints, and also just acquired USB 2 and 3 designs from Innovative Logic as well as most of its team in India.

RISC-V is an open-source instruction set specification backed by Western Digital, Nvidia, Qualcomm, Google, and others – and is keeping Arm on its toes, to quote Arm CEO Simon Segars.




Biting the hand that feeds IT © 1998–2019