Original URL: https://www.theregister.com/2011/10/20/details_on_big_little_processing/

Deep inside ARM's new Intel killer

big.LITTLE bad news for Chipzilla

By Rik Myslewski

Posted in Personal Tech, 20th October 2011 23:12 GMT

ARM has swung a one-two punch at Intel's plans to muscle in on the smartphone and tablet space that's currently dominated by the plucky chip designers from Cambridge.

At press soirées in London and San Francisco on Wednesday, ARM announced both a design for a tiny new chip, the Cortex-A7 MPCore, and a system-on-chip scheme that will marry the new A7 with the much more robust Cortex-A15 MPCore, which was announced last September and which should see the light of day next year.

We will grit our teeth and use ARM's designation for this multi-core mashup – big.LITTLE processing – which is ARM's new marketing term for its implementation of what it, AMD, Microsoft, and others were calling "heterogeneous computing" at AMD's Fusion Summit this summer.

Simply put, heterogeous computing means putting a number of dissimilar, specialized cores on the same slice of silicon – CPU and GPU cores, for example – and parcelling out tasks to each core for the work that suits it best.

AMD leveraged its graphics and x86 expertise, and slapped both Radeon graphics and Phenom compute cores onto its A-series "APUs" – accelerated processing units – that it released this June.

ARM, on the other hand, is playing to it own strengths: its ability to design ultra-lower-and-lower power chips, and the work it has done on the Cortex-A15 – which, it can persuasively be argued, the Cambridge company is aiming as much at the low-end laptop, desktop, and mini-server markets as it is at its traditional handheld hegemony.

As we reported after the A7's Wednesday unveiling, the big.LITTLE partnership of the A7 and A15 will let A7 cores handle the easy stuff – background processes, making phone calls, tweeting, Facebooking – while the A15 cores will kick in when more oomph is needed for video, gaming, and the like.

The two core designs – which can both be built in one to four-core implementations – are certainly different enough to be handed quite different tasks, but it is their similarities that make the big.LITTLE scheme work.

For example, both can have their own L2 cache, and both of those caches can communicate over ARM's CCI-400 cache coherent interconnect.

ARM big.LITTLE system block diagram

The A7 and A15 may look identical in this diagram, but they most certainly aren't

More important, however, is that the two cores are based on the same architecture: ARM's ARMv7-A. Being twins under the skin, they'll both be able to run well-behaved ARM code with no modifications needed.

And, more important still, the big.LITTLE scheme will be able to toss that code at either the A7 or A15 cores without the software needing to know where it's going, thanks to modern mobile operating systems' ability to tell a chip what power an app needs. The OS doesn't need to know to which cores big.LITTLE is sending the work – ARM's scheme will take care of that housekeeping on its own.

Which is a good thing, seeing as how despite their similarities, the A7 and A15 are quite different in how they get work done.

ARM Cortex-A7 MPCore pipeline

The ARM Cortex-A7 MPCore's simple 8-to-10 stage pipeline

The A7 is far simpler and less powerful than the A15. But its simplicity requires far fewer transistors than does the A15's complexity – and fewer transistors require less juice to operate.

Specifically, as explained in an ARM white paper, the A7 is an in-order, non-symmetric processor with a pipeline length of 8 to 10 stages. The li'l fellow has a single queue for all of its execution units, and two instructions can be sent to its five execution units per clock cycle.

ARM Cortex-A15 MPCore pipeline

The more complex ARM Cortex-A15 MPCore has a 15-to-24 stage pipeline (click to enlarge)

The A15, on the other hand, is an out-of-order processor with a pipeline length of 15 to 24 stages. Each of its eight execution units has its own multi-stage queue, and three instructions can be processed per clock cycle.

As ARM explains, "In general, there is a different ethos taken in the Cortex-A15 micro-architecture than with the Cortex-A7 micro-architecture. When appropriate, Cortex-A15 trades off energy efficiency for performance, while Cortex-A7 will trade off performance for energy efficiency."

This performance/power trade-off can be seen in an ARM supplied chart:

ARM big.LITTLE system performance details

There's a lot of work that can be done down there in red-line territory

It's interesting to look at the gap between the far right of the A7's line and the far left of the A15's. Into that gap you can fit all of those A15 transistors – idle or not, they need circuitry to keep an eye on them and the rest of the die, and tell them when to wake up.

You might reasonably be concerned whether big.LITTLE will waste time switching from one core cluster to another. According to ARM, there's little to be concerned about, seeing as how a full switchover can take place in about 20,000 cycles.

Now, that might sound like a long time, but think of it this way: if the A7-A15 matchup is cruising along at 1GHz, the switchover will take a mere 20 microseconds. So if big.LITTLE can be smart enough to keep a task from getting stuck switching back and forth between high A7 and low A15 – which shouldn't be rocket science – all should be well.

In addition, a mode known as big.LITTLE MP can have both core clusters in operation at once, sharing data over that cache coherent interconect – and "snooping" to ensure that the right data is in place at the right time, with no conflicts.

More worries for Intel

ARM has lined up an impressive list of partners in support of big.LITTLE: Broadcom, Compal, Freescale, HiSilicon, LG Electronics, Linaro, OK Labs, QNX, Redbend, Samsung, Sprint, ST-Ericsson, Texas Instruments. But the one industry biggie who's likely taking the greatest interest in it – and not necessarily in a positive light – is Intel.

For years, Intel has been trying to push its products down into the exploding smartphone and mobile-device market. As of today, they've not been successful. ARM dominates. Well, more than dominates, really – ARM owns the market.

As each of its processor generations rolls, Intel assures us that its move into mobile devices is about to succeed. However, 2008's "Menlow" was a bust, and last year's "Moorestown" didn't exactly set the world on fire.

Remember Intel's dreams of MIDs - mobile internet devices? Smartphones ate them alive, and smartphones run on ARM.

Today, Intel still has high hopes for it 28 nanometer "Medfield" processors, and even higher hopes for lower-power, better-performing chips built using its recently announced 22nm Tri-Gate process.

But now it has a new worry: an ARM scheme that can dive deep down into the low, lower, lowest power zones, then rise up to levels needed for decent gaming, video, augmented reality, and other processor-intensive mobile apps.

Oh, and the Cortex-A15 MPCore will run Windows, as well.

Sure, Intel has fabulous process technology and a few thousand of the smartest engineers on the planet – but ARM has the mobile market. And Wednesday's announcement of the lower-power, faster A7 and the wide-ranging big.LITTLE mashup proves that the folks from the UK are serious about continuing their domination. ®