AMD's new Carrizo: The x86 notebook processor that thinks it's a GPU
We drill into the tech claims
ISSCC 2015 AMD claims its new x86-powered Carrizo system-on-chip for notebooks has more transistors and yet consumes less power than the previous generation Kaveri – and has shown off some of its engineering to help back that up.
In time for this week's International Solid-State Circuits Conference in San Francisco, AMD has prepared a presentation on its Carrizo accelerated processing unit (APU) – which packs four Excavator x86 cores and eight Radeon GPU cores for laptops and similar gear.
The APU is due out sometime in the first half of this year: the chip giant today wants to talk about the system-on-chip's architecture rather than its feeds and speeds.
One key thing to keep in mind is that Carrizo is a 28nm process chip: while Intel is full steam ahead on its 14nm FinFET designs, AMD is betting big on its ability to squeeze as much performance per watt out of the larger gate size before it starts shrinking processes.
Also: just 16 per cent of the 250mm2 Carrizo die is x86 compute; the rest is graphics, video acceleration and IO. This is AMD's attempt to differentiate itself on the mobile market.
AMD chief technology officer Mark Papermaster likened his biz to a "scrappy engineering company" that had to realize, in the face of competition from goliaths like Intel, that it had to concentrate its talent on a few crucial areas – in this case, going after Chipzilla's Core i5 family and extracting as much oomph out of notebook and laptop-tablet-combo batteries as possible.
We're told the chip has 3.1 billion gates, 29 per cent more than the 28nm Kaveri, on about the same size die. The extra transistors are used for the graphics hardware, the integrated Southbridge (which controls the PC's peripherals), acceleration for 4K H.265 video playback, HSA, and other features.
With this amount of stuff built into the APU, fewer chips have to appear on the motherboard of a laptop: "the smaller the BOM [bill of materials], the more space for the battery," Joe Macri, AMD's product chief technology officer, told The Register this evening. Bigger batteries means more minutes on Twitter.
The Carrizo APU conforms to HSA 1.0, an architecture in which the GPU and CPU cores share the whole physical memory map, working together coherently on the same blocks of data as they execute code. AMD reckons this means software can more easily offload things like image recognition to the GPU side of the silicon and keep the CPUs running other things.
The HSA 1.0 specification is due to be published in the next few weeks.
Let's break down AMD's bigger boasts – because reduced power consumption and increased performance means laptops using these chips will get more bang from their batteries. The chip maker is aiming at 12 to 35W TDP per Carrizo APU package; the x86 cores drawing five to 10W as part of that.
The new x86 Excavator cores
AMD has taken its design tools used to layout the GPU electronics and applied them to its Steamroller x86 application core. This has, we're told, shrunk the size of the core down to what's been presented this week: the Excavator series. It has reduced the area used by the application core by 23 per cent, allowing Excavators to generally run at a higher clock frequency than Steamrollers while consuming the same amount of power.
The Excavators have more gates than the Steamrollers, too, mainly to add in support for Intel Haswell instructions.
The shrinkage is possible by using GPU-style metallization. The wires connecting up gates in CPU cores tend to be fat and tapered – ideal for transmitting signals at high frequency in serial. GPU cores process data in parallel at a lower frequency, using networks of thinner wires to trade speed for parallelization. By shifting the x86 core over to thinner, GPU-like stacks of metal interconnects, AMD says it can save a lot of space while still providing enough performance for laptops: a five per cent increase in instructions executed per cycle and up to 40 per cent less power consumed than the previous generation.
"We've taken the high-density design library of the GPU, and applied it to the CPU. This has squeezed the design down in terms of power and area," Sam Naffziger, the AMD corporate fellow who oversaw the power management of Carrizo, told The Register.
What happens when you apply GPU design to an x86 CPU
The level two (L2) cache for the cores has been halved, since Kaveri, to 1MB, leaving more die space for other hardware. The per-core L1 cache has been doubled to 32KB, and the on-chip buffers and FIFOs have also been increased.
"The [L2] cache doesn’t consume a lot of power, and halving it makes more die area available. 1MB is good enough for most applications," Naffziger added.
Adaptive voltage and frequency scaling (AVFS)
The GPUs and CPUs are better than Kaveri at dealing with temporary noise on their supply voltage, we're told: rather than demand an over-voltage to cope with any wobbles in the supply, the chip can, within less than a nanosecond, respond to any drops below the minimum threshold by scaling back clock frequencies and consume less power – in other words, gracefully cope with the supply droop.
If you'll forgive the slightly clumsy metaphor, when driving over bumps in the road, you can either use expensive suspension to absorb the shocks, or slow down. AMD's Carrizo prefers to hit the brakes, but its engineers say this is good for power consumption overall: it avoids running all the time with an excessive Vdd voltage that is a waste of power when the supply is good.
This is supposed to reduce the CPUs' power consumption by up to 20 per cent and the GPUs' by up to 10 per cent.
On the subject of frequency scaling, the APU's power management hardware is wired deep into the Excavator cores, allowing it to sense how well they are each performing. As well as the usual temperature and power-usage sensors, the AVFS tech can detect the variances in the voltages and frequencies of signals across the die, and regulate the processor's activities to keep things stable. Not every die is created equal: the silicon's characteristics change ever so slightly, prompting AMD to tune its cores in this way.
There are 10 AVFS modules per Excavator core, we're told, containing 500 frequency sensing points.
"We're making the circuits a lot smarter, adapting and measuring voltage and speed capability and the core temperature, and informing the power manager of what they’re doing," Naffziger told us.
"Gone are the days of a 'one size fits all' processor. Carrizo has more access to the internals of the CPU and the GPU."
The power manager is also, we're told, rather good at switching to an equivalent of the S3 power state – less than 50mW consumed by the APU – very quickly. This state is called SOi3, and leaves just the power electronics and fusion controller hubs running awaiting some kind of interrupt. It can be entered far more rapidly than S3; the SOi3 and S3 states require the operating system to intervene to enter and leave, but with the Southbridge and IO integrated, the APU can put to sleep and bring up peripherals faster than its predecessor. It means notebooks waking up from a low-power state very quickly, we're told.
There's also a 32-bit ARM-compatible microcontroller in the APU that implements ARM's TrustZone; it is the first core to execute code when the system-on-chip powers up, and is expected to initialize the package and ensure a trusted operating system is loaded.
Where's all this going
AMD's execs are especially proud of Carrizo, and they need to be. This APU has to do some unblocking. Sliding demand for PCs and other mishaps have left California-headquartered AMD with a distribution channel stuffed with about $100m in unsold chips.
Legend has it that when Nintendo execs met hardware engineers at ATI to discuss putting their graphics processors in the Wii, a modest console that turned into a runaway success, the electronics bods were told: "The hardware doesn't matter." The Wii's focus on family entertainment was vital to its triumph in the living room; the silicon just had to be good enough.
AMD, today the owner of ATI, has to face up to a similar peril: no matter how good Carrizo turns out to be, its commercial success is dependent on the computers it ships in. If it's bunged in over-priced notebooks, or slipped into watt-guzzling combo-laptops, all the silicon-level engineering work will have gone to waste. If the gear is decent, but everyone hates the default operating system – let's say, Windows 10 – who will remember the gains at the nanometer level?
Naffziger told us AMD is leaning on manufacturers hard to ensure nothing is squandered: every machine using a Carrizo must have every component justified to ensure power isn't wasted needlessly.
"We're not just chucking chips over the wall," smiled Macri. ®