The Register® — Biting the hand that feeds IT

Feeds

IBM slips automatic tranny into Power7

Self-aware chippery

Hot Chips "Civilization advances by extending the number of important operations which we can perform without thinking about them," said mathematician Alfred Lord Whitehead, in his 1911 tome, An Introduction to Mathematics. And with its Autonomic Computing effort, IBM believes it's advancing civilization.

On Monday in Palo Alto, California, IBM gave attendees at this year's Hot Chips conference a deep dive into three of the latest developments in its nearly decade-long effort to create computers that dynamically self-optimize.

"As good engineers we always put guardband in just to cover our tails," Power7 EnergyScale architect Michael Floyd told his Hot Chips audience — somewhat less elegantly than Lord Whitehead, whose quote introduces Autonomic Computing on Big Blue's website. Floyd was introducing one of the three autonomic features in the Power7 processor's EnergyScale power-management system: the reduction of wasteful guardband.

By "guardband", Floyd was referring to the nececessary but inefficient practice of adding a smidgen of time into each clock cycle so that variabilities in, say, clock and data signals won't mess up signal-recognition timing.

Those variabilities can result from a broad range of unwelcome timing tweakers: voltage droop and power supply variabilities, thermal variabilities, processor aging, and more.

The problem with most guardband schemes is that they're static — they're built into the chip, and as such need to guard against worst-case scenarios. What IBM has added to EnergyScale for the Power7 — and which wasn't in EnergyScale for the Power6 — is a Critical Path Monitor (CPM) that performs what the company calls "circuit margin feedback" to monitor and adjust guardband in real time.

According to Floyd, the benefits of CPM-based dynamic guardbanding can be used to either boost performance or save energy. He claimed that IBM testing has shown that such a technique — all else being equal — can either allow a CPU to be overclocked by 7.3 per cent or have its power needs reduced by up to 15.8 per cent.

For you product-testing geeks out there, Floyd got these numbers from a 32-core IBM Power 750 Express Server with 64GB of memory, running SPECPower_ssj at 100 per cent load, and with the EnergyScale policy set at DPS-FP. On, and the ambient temp was 22°C.

A second autonomic feature in the Power7 EnergyScale scheme is low-activity detection (LAD), which drops processor frequency, thus saving power, when the processor has nothing better to do — for example, when running memory-bound workloads and waiting for data to arrive.

"As you guys know," Floyd said to the geek-filled crowd, "a lot of workloads are memory-bound, or at least certain points in time are memory-bound, and you don't always need the full processor at peak frequency during those times. The interesting thing that we found ... is that systems that appear to be 100 per cent utilized when using traditional metrics to measure what the system's doing, actually are 100 per cent idle."

"This may sound counterintuitive at first, but if you're running an idle loop, or if you're polling a place in memory, you may be really busy but technically you're idle — and you're not getting a whole lot of work done," he said. Not good — a waste of power.

So when the LAD detects such a condition, it instructs the digital PLL (DPLL) to drop the frequency — which it can do at 25MHz resolution at an up-or-down speed of 50MHz per microsecond, dropping frequency by up to 50 per cent (or, in other scenarios, raising it by up to 10 per cent). This same effect can be accessed by software using a technique that Floyd called Green Polling.

The third autonomic power-saving upgrade in Power7 EnergyScale is what Floyd called the processor-core Power Proxy — a way of finding out what a processor core's power consumption is without directly measuring it.

"We have eight processor cores running on this chip, and due to system constraints we can't put an external voltage regulator on each one of these processor cores — it's too prohibitive," said Floyd, explaining why all the cores share the same voltage point.

To be more exact, to use IBM lingo, it's not just the Power7's cores that share the same voltage point, but all its "chiplets" — that's Big Blue's term for a core, its associated L2 and L3 caches, and some connective tissue.

This lack of per-chiplet voltage regulators creates a problem for power management: "You can't make intelligent decisions because you don't know how much each of their eight processor-core chiplets is burning." Floyd pointed out. "You can see how much they're burning as a whole, but that doesn't help if you're trying to do power shifting or power trade-offs between the multiple cores."

To the rescue comes the Power Proxy scheme, a hardware-based system that samples activities in different areas of each chiplet — e.g., a cache read or write, an execution pipeline issue, or some such — then weights each activity to represent how much power it consumes, combines the weighted results from the chiplet's subsections, then sends the final stats off to the EnergyScale firmware.

That firmware, in turn, treats the Power Proxy inputs as if they were direct measurements of power rather than estimates based on chiplet subsection activity, and allocates power among cores — or even to other components such as memory — as needed.

These three new autonomic features are just the latest additions to IBM's EnergyScale architecture. A more-complete discussion of the power-saving features in EnergyScale's Power7 implementation, including its policy-based, customer-managed tunability, can be found in a 51-page white paper, here (PDF). ®

Latest Comments

But

Only if they publish their Source Code.

Take a dekko at the first-ever Open Source releases of Mozilla and OpenOffice.org for an example of the sorts of things that developers of closed-source software have been known to think they could get away with because nobody was watching.

0
0

Well

...there are rumors that some developers actually use proper synchronization techniques like pthread_mutex_lock(), select(), WaitForSingleObject() and signal handlers.

0
0

Been there done that

I use my hand to determine which burners are on and turn off the ones i dont need once the grill is up to heat and I dont need all the grills on

0
0

Smart apps first

I'm sure this works great on their smart code, but the first time you run it with typical code, all of the benefits are gone because of how wait timing is done, which is usually spin the CPU for N seconds, and then try again...

Or am I being overly pessimistic?

0
0

Think I accidentally invented something like this once

I had an idea for a processor that would run flat-out, as fast as propagation delays would let it, by using an oscillator formed from an even number of NOT gates and a NAND or NOR gate (just so you can stop the oscillator dead when needed). That much, of course, isn't new. My idea was to distribute the gates (which are small enough) throughout the silicon, so that anything that increased the propagation time through one region would automatically slow down the oscillator.

(Side note: Does discussing this on the Reg Forums count as Prior Art?)

0
0

More from The Register

Android is a mess and needs sprucing up, admits chief
Can Google really fix it? It isn't in control any more
New Lumia 925: This, loyalists, is the BIG ONE you've waited for
Nokia veep drills high-end master plan for El Reg
Android device? Ooohhhh, you mean a Samsung phone
Koreans nabbed nearly all the Q1 profits – more even than Google
Review: HP Pavilion 14 Chromebook
All roads lead to Chrome?
Borked your iDevice? Pay EVEN MORE to have it fixed by Applecare
Or scream at their hapless techies on their forums
Euro PC shipments plummet into bottomless pit of DOOOOM
11th quarter of decline, 20pc drop on last year - Gartner
Report: AT&T dropping Facebook phone after dismal sales
Turns out folks won't buy that for a dollar