IBM slips automatic tranny into Power7
Hot Chips "Civilization advances by extending the number of important operations which we can perform without thinking about them," said mathematician Alfred Lord Whitehead, in his 1911 tome, An Introduction to Mathematics . And with its Autonomic Computing  effort, IBM believes it's advancing civilization.
On Monday in Palo Alto, California, IBM gave attendees at this year's Hot Chips  conference a deep dive into three of the latest developments in its nearly decade-long effort to create computers that dynamically self-optimize.
"As good engineers we always put guardband in just to cover our tails," Power7 EnergyScale architect Michael Floyd told his Hot Chips audience — somewhat less elegantly than Lord Whitehead, whose quote introduces Autonomic Computing on Big Blue's website. Floyd was introducing one of the three autonomic features in the Power7 processor's EnergyScale  power-management system: the reduction of wasteful guardband.
By "guardband", Floyd was referring to the nececessary but inefficient practice of adding a smidgen of time into each clock cycle so that variabilities in, say, clock and data signals won't mess up signal-recognition timing.
Those variabilities can result from a broad range of unwelcome timing tweakers: voltage droop and power supply variabilities, thermal variabilities, processor aging, and more.
The problem with most guardband schemes is that they're static — they're built into the chip, and as such need to guard against worst-case scenarios. What IBM has added to EnergyScale for the Power7 — and which wasn't in EnergyScale for the Power6  — is a Critical Path Monitor (CPM) that performs what the company calls "circuit margin feedback" to monitor and adjust guardband in real time.
According to Floyd, the benefits of CPM-based dynamic guardbanding can be used to either boost performance or save energy. He claimed that IBM testing has shown that such a technique — all else being equal — can either allow a CPU to be overclocked by 7.3 per cent or have its power needs reduced by up to 15.8 per cent.
For you product-testing geeks out there, Floyd got these numbers from a 32-core IBM Power 750 Express Server with 64GB of memory, running SPECPower_ssj at 100 per cent load, and with the EnergyScale policy set at DPS-FP. On, and the ambient temp was 22°C.
A second autonomic feature in the Power7 EnergyScale scheme is low-activity detection (LAD), which drops processor frequency, thus saving power, when the processor has nothing better to do — for example, when running memory-bound workloads and waiting for data to arrive.
"As you guys know," Floyd said to the geek-filled crowd, "a lot of workloads are memory-bound, or at least certain points in time are memory-bound, and you don't always need the full processor at peak frequency during those times. The interesting thing that we found ... is that systems that appear to be 100 per cent utilized when using traditional metrics to measure what the system's doing, actually are 100 per cent idle."
"This may sound counterintuitive at first, but if you're running an idle loop, or if you're polling a place in memory, you may be really busy but technically you're idle — and you're not getting a whole lot of work done," he said. Not good — a waste of power.
So when the LAD detects such a condition, it instructs the digital PLL (DPLL) to drop the frequency — which it can do at 25MHz resolution at an up-or-down speed of 50MHz per microsecond, dropping frequency by up to 50 per cent (or, in other scenarios, raising it by up to 10 per cent). This same effect can be accessed by software using a technique that Floyd called Green Polling.
The third autonomic power-saving upgrade in Power7 EnergyScale is what Floyd called the processor-core Power Proxy — a way of finding out what a processor core's power consumption is without directly measuring it.
"We have eight processor cores running on this chip, and due to system constraints we can't put an external voltage regulator on each one of these processor cores — it's too prohibitive," said Floyd, explaining why all the cores share the same voltage point.
To be more exact, to use IBM lingo, it's not just the Power7's cores that share the same voltage point, but all its "chiplets" — that's Big Blue's term for a core, its associated L2 and L3 caches, and some connective tissue.
This lack of per-chiplet voltage regulators creates a problem for power management: "You can't make intelligent decisions because you don't know how much each of their eight processor-core chiplets is burning." Floyd pointed out. "You can see how much they're burning as a whole, but that doesn't help if you're trying to do power shifting or power trade-offs between the multiple cores."
To the rescue comes the Power Proxy scheme, a hardware-based system that samples activities in different areas of each chiplet — e.g., a cache read or write, an execution pipeline issue, or some such — then weights each activity to represent how much power it consumes, combines the weighted results from the chiplet's subsections, then sends the final stats off to the EnergyScale firmware.
That firmware, in turn, treats the Power Proxy inputs as if they were direct measurements of power rather than estimates based on chiplet subsection activity, and allocates power among cores — or even to other components such as memory — as needed.
These three new autonomic features are just the latest additions to IBM's EnergyScale architecture. A more-complete discussion of the power-saving features in EnergyScale's Power7 implementation, including its policy-based, customer-managed tunability, can be found in a 51-page white paper, here  (PDF). ®