AMD details aggressive power-efficiency goal: 25X boost by 2020
10X lift in past six years, 25X in next six thanks to real-time tuning, sleepy video
And now the real work begins
But there's much more to come, Naffziger told us. For example, although on-die power management has become smarter over the past few years, there's a lot more efficiency to be squeezed out of it.
Currently, an AMD APU has three primary voltage planes – areas supplied with their own power sources. The more voltage planes you have, the better you can control how much juice your die is consuming.
For example, Naffziger said, the GPU shares a voltage plane with the Northbridge – the memory interface. "A lot of times the GPU isn't doing anything," he said, "but the Northbridge is having to supply data to the CPUs, so you have to keep that voltage plane pretty high."
The GPU is power-gated with on-die switches, he said, but the switches are "imperfect – they're only about 10 per cent efficient – and then there's some things that you just can't power-gate." Solution: separate voltage planes for the GPU, Northbridge, CPU, caches, whatever. More control means more efficiency when each piece of the die gets exactly the voltage it needs.
"Now, that's easier said than done," Naffziger said. The die's smarts need to know exactly what amount of juice to apply exactly where at exactly at the right instant – but subdividing the die into multiple voltage planes is the start.
"Once we've subdivided the voltage planes," he said, "then we can optimize each one of them – and there's a whole bunch of adaptive techniques to use, some of which we've deployed, but most of which are in development that enable that kind of real-time adaptation."
As The Reg explained in our deep dive into AMD's Kaveri chip earlier this month, that chip has thousands of monitors arrayed about its die, some keeping track of temperature, and many more tracking activity and power usage. Information from these monitors are used to boost, throttle, shut down, and maintain die elements, with the goal of operating the chip at maximum efficiency.
While that may sound relatively straightforward, it's actually maddeningly complex to optimize the use of the data from those monitors to tune the chip in real time. For just one example, although temperature sensors provide valuable feedback that can be used to adjust power, there's a built-in thermal latency to temperature sensing, a latency not shared by monitors that are simply reporting on activity or power draw.
One solution to this conundrum, Naffziger said, is to use inferences algorithmically derived from power and activity data to proactively deal with impending temperature changes – speeding up the fan, for example, or switching processing from an about-to-be-hot core to a cooler, underutilized one.
"That's one example of the innovations that we're in the early stages of," he said. "There's a lot of opportunity to tune stuff real-time."
Neffziger also touched on the "race to idle" concept – not a new idea, but one that can increase efficiency by counterintuitively raising the power to enable faster performance for a short amount of time, then dropping the power more quickly than if the core had not been sped up to complete a task in less time.
As an example, he suggested "inter-frame power gating", in which a video frame is quickly rendered, then the renderer is shut down and memory is put into a low-power or sleep state until the next frame. That may also sound somewhat counterintuitive, seeing as how we perceive video frames as a continuous process – but it most certainly isn't continuous from a processor's point of view.
"It's 33 milliseconds," he said, "which is like all day. If rendering a frame takes 5, 10 milliseconds, then you've got 20-plus milliseconds to sit around and do nothing."
It may sound whack, but you can reduce overall power by selectively boosting power (click to enlarge)
What's more, not all video is created equal, so not all of it requires the same amount of rendering time. An APU's video hardware, however, has to be capable of handling the most-demanding video, so it's over-provisioned, Naffziger told us. For most video, all of that capability isn't needed, so between-frame naps can save a boatload of power.
Each and every optimization requires close coordination between software and hardware teams, and that process can be slow-going. "You'd be surprised how many years it takes. We prototype these, then have to get the bugs wrung out. Generally this is a three-year development cycle," he said.
"To come in and make bold claims about power-efficiency gains wouldn't be credible if we didn't have this pipeline of multi-year IP development now underway," said the man who's "confident" that AMD can achieve a 25X improvement in power efficiency – despite the fact that silicon is less and less on his side. ®
Sponsored: The Nuts and Bolts of Ransomware in 2016