It's beginning to look a lot like multi-threaded CPUs, everywhere you go... Arm teases SMT Cortex-A65AE car brains

Robo-ride processor core acts a lot like Intel Hyper-Threading

smart city concept drawing - self-driving cars, wifi hotspots etc - but no people

Arm will today announce its Cortex-A65AE processor core aimed at powering self-driving cars and in-vehicle entertainment. Somewhat buried in the bumph we glimpsed ahead of the launch, though, is something very curious.

This will be, it is claimed, Arm's first-ever simultaneous multithreaded CPU core. As in, each core can run two separate threads at the same time, just like Intel's Hyper-Threading feature, and similar hardware threading in AMD and MIPS processors.

This is significant because Arm has resisted simultaneous multithreading (SMT), instead opting to lash together lots of cores in its big.LITTLE arrangement: a cluster of small cores running apps, and a cluster of larger cores powering up to take on bursts of intensive work.

Arm slide showing Cortex-A65AE multithreaded tech

Cortex-A65AE features ... Click to enlarge (Source: Arm)

Softbank-owned Arm has toyed with SMT, mulling adding it to its blueprints on and off publicly since around 2010, though it always discarded the idea and settled on multiple single-threaded cores instead. It produced a paper in 2013 [PDF] setting out why it wasn't happy with SMT: for mobile apps, it doesn't make sense in terms of performance gain and power usage, although it noted other settings could benefit from it.

You see, not all applications are boosted by SMT, and while some gain performance increases from running multiple threads through each available core, some programs do not benefit at all or are penalized by it. SMT typically works by splitting up the functions of a CPU core so that its various engines, such as integer and floating-point math units, are divvied up between two separate threads running simultaneously through the core. The result is fewer parts of the core left idle, and more software instructions completed per second, ideally.

Robot drives a car. Conceptual illustration from Shutterstock

Take the wheel, Arm tells its notebook-grade Cortex-A76 CPU: Now you're a robo-ride brain

READ MORE

Well, now Arm is targeting more than just low-power devices and smartphone apps: it wants its Cortex AE series embedded in cars, where the chips can power entertainment displays, or run sensor and control code within an autonomous driving system. The firm earlier this year announced a 64-bit Armv8-A Cortex-A76AE (the AE stands for automotive enhanced), and now it is touting the 64-bit 7nm multithreaded Cortex-A65AE.

The A65AE's multiple hardware threads – two per core – are supposed to help the final system-on-chip suck in and analyze in real time more sensor data, such as camera feeds and LiDAR signals, per second, so that the self-driving software can make faster and more accurate predictions on what to do – turn the wheel, touch the brake, etc. Arm characterizes this as a high-throughput CPU design, in that there are multiple threads simultaneously handling incoming sensor and positioning data, allowing the computer to quickly make informed decisions. That may mean the difference between a smooth and safe ride and a hesitant jerky journey in one of these things.

We'll just set aside the thorny issue of the viability of autonomous vehicles for now, and concentrate on the underlying chip tech, which may make its way into other Cortex-A blueprints. Whether it's for political, technological, or psychological reasons, or a mix of all three, truly driverless jalopies could be anywhere from five to 30 years out.

Chipzilla plot twist

Amusingly, just as Arm is embracing SMT, not only is Intel cooking up its own version of big.LITTLE for its future x86-64 chips, but some folks recommend disabling Intel's Hyper-Threading feature for security reasons – particularly if your software doesn't benefit from it.

To be clear, Arm thinks the Cortex-A65AE will power the brains of next-gen advanced driver-assistance systems – aka super-cruise-control – that are shy of fully autonomous robo-rides, which may or may not come later. Indeed, Lakshmi Mandyam, veep of automotive within Arm, will today talk about the Cortex-A65AE "strengthening driver trust on the road to safe mass autonomous deployment." The aim is to build super-cruise-control assistance that people feel comfortable relying on, before going into the hard sell on autonomous motors.

As with the A76AE, the A65AE touts various industry standard features, and split-lock functionality. In lock mode, two cores are paired up and execute the same instructions at the same time in lockstep, clock cycle by clock cycle. If one core diverges from its twin, this indicates a random hardware error has happened, such as a transistor gate flipped by cosmic radiation, and this stumble can be caught and recovered from automatically.

Thus, you can run a group of A65AE cores in lock mode, running safety-critical engine control code that cannot fail, and others in split mode for extra performance. The final decision on controlling the vehicle is taken in code in lockstep mode, ideally. The A64AE CPU cores can also interface with any connected accelerators, such as machine-learning processors via an Accelerator Coherency Port, to speed up specialist tasks, such as neural-network inference.

Arm slide showing Cortex-A65AE multithreaded tech

Sensor data piped through multiple threads

Arm is imagining a system with, say, A65AE CPU cores feeding in and processing incoming sensor data, passing the information to two clusters of A76AE and A65AE cores to run through trained self-driving AI models, with any attached accelerators speeding up that inference and heuristics work, and finally a cluster of A76AE cores making the ultimate decision at the wheel. The final speeds and configuration of the CPU cores will be left to the system-on-chip designers. As a guideline, the non-automotive A76 tops out at 3GHz.

We do know that A65AEs can be grouped into up to eight cores per cluster, so that's 16 hardware threads per cluster, or eight locked threads when running in lockstep mode. In other words, enabling SMT does not disable lock mode: locked SMT mode pairs threads to ensure there is no random deviation.

Here's a set of tables showing how hardware threads are shared across eight cores in an A65AE cluster, from 16 threads in split mode to eight in lock mode:

Arm's description of its split lock mode with SMT

Arrangement of threads and cores in lock and split mode with SMT enabled ... Source: Arm engineering

More details on the architecture are expected to be revealed early next year; right now, this is a bit of a teaser, and a surprise mention of multithreading.

It is hoped semiconductor designers will license its Cortex-A65AE and get the system-on-chips built and on sale in 2020. Expect to see an announcement pop up here sometime today. ®




Biting the hand that feeds IT © 1998–2019