Idle wild: how Intel's mobile Core i7 speeds up to slow down
Turbo Boost, Thread Parking and the drive for low-power performance
Intel's first mobile Core i7 processors - codenamed 'Clarksfield' - incorporate a feature the chip company is calling Turbo Boost. It's not new - the technology is a part of every 'Nehalem' architecture-based CPU the company has released to date.
So, Turbo Boost can be found in desktop chips and it's in Xeon server parts too. But it really comes into its own in processors produced for laptops.
Intel's Clarksfield: all four cores can be powered down to zero
The mobile Core i7s are all quad-core parts, and while model numbers and operating clock frequencies differ, Turbo Boost works the same way in each case. The technology takes feedback from on-chip thermal sensor and watches how the operating system is scheduling work on the available cores. Using both sources, Turbo Boost sees if it can lift the chip's clock speed and operating voltage above baseline.
A 2GHz chip with threads scheduled on all four cores has the scope to be dynamically overclocked up to 2.26GHz, provided there's room within the chip's thermal envelope to do so. That's 55W on the 2GHz Core i7-920XM, falling to 45W with the 1.73GHz i7-820QM and the 1.6GHz i7-720QM.
If an application is only making use of two of the four cores, the remaining pair of processing units can be powered right down to zero, dropping the chip's overall thermal output and allowing the two running cores to be clocked anywhere up to 3.06GHz.
The current Clarksfield line-up
A one-thread, one-core application presents even more room for lifting the clock frequency as the other three cores are sent to deep sleep, slashing the heat coming off them and, in turn, allowing Turbo Boost to up the one core's clock frequency up to 3.2GHz - 60 per cent higher than the stock clock speed.
Naturally, each overclocked core pumps out more heat than it would do otherwise, but Intel insists that this won't adversely affect their longevity the way traditional overclocking sometimes can. This is because the Core i7 as a whole stays within its proscribed thermal limits.
Turbo Boost: up the frequency, a little bit for four cores...
Core and die temperatures are sampled constantly so the degree of overclocking will depend on moment-by-moment heat output, which is not only dependent on the CPU's speed and voltage settings but also on the efficiency of the cooling system plugged on top of it.
...further for two cores...
You might think that upping one or more cores' clock frequency would be a bad idea when the goal is to reduce power consumption as much as possible, the better to extend the runtime of the host laptop's battery charge.
...or even higher when only one core is scheduled
True, Turbo Boost does increase power draw, but Intel maintains that it's better to suffer a burst of power and allow the cores to complete their work more quickly than to take longer processing a task at a lower clock frequency.
It's clearly a balancing act, choosing which of those two strategies you pursue to minimise power draw, and if all four cores are going flat out - which assumes that each one's HyperThreading-enable virtual second core is in use too - the thermal limit will be hit and there'll be no jump beyond the baseline clock frequency.
Intel's widget shows the level of overclocking
The use of Turbo Boost tech might seem to imply that single-threaded apps are better since they run at higher clock speeds, but four cores operating simultaneously at 2-2.26GHz should deliver better performance overall than a single core at 3.2GHz. Intel wants no let-up in software developers' efforts to exploit opportunities for parallelism within their code and create apps that spawn multiple threads.
Run the same 1080p HD video conversion on a 2.3GHz Core 2 Duo T2700 and on a laptop with a 1.3-2.8GHz i7-720QM, says Intel, and the latter will be done in 38 minutes. The first system will take more than two-and-a-quarter hours: 137 minutes, roughly 3.6 times as long.
That's not just Turbo Boost, HyperThreading and the extra two physical cores, mind - having access to faster, 1333MHz DDR 3 memory and 8MB of shared L3 cache plays its part too.
And for all the talk of "intelligent performance" from marketing types, Turbo Boost is really about power conservation. Increasing performance is HyperThreading's job, by maximising the scope for running tasks at the same time. Turbo Boost, where it can, helps those tasks complete more quickly. The sooner a task is done, the sooner a core can tell the OS it's finished and the operating system can, using established SpeedStep technology, tell it to go to sleep for a while.
Performance boost: how faster Intel claims Nehalem is over previous chips
Click for a full-size chart
The whole approach is to get the CPU back to idle as soon as possible without penalising performance when it's needed. Which is why it'll pay off when the notebook is connected to its AC adaptor, ensuring the CPU cores aren't going flat out all the time and so conserving energy whether it comes from the battery or for the mains.
Powering down the processor more quickly should help keep the cooling fan running less frequently. No one likes a noisy notebook, and since Turbo Boost operates within the existing CPU thermal envelope, the fan shouldn't need to spin higher, either.
We won't be able to say how well it works until real-world tests are conducted - watch this space.
Getting cores into an idle state slashes power consumption
Turbo Boost operates independently of the operating system - unlike SpeedStep and HyperThreading, for example - so it'll benefit Windows, Linux and Mac OS X users equally.
Windows Vista currently is well able to make use of HyperThreading and SpeedStep, but Windows 7 adds some new tricks to make smarter use of the available processing resources, the better to minimise power drain when performance isn't paramount.
Using a technique called thread-parking, Windows 7's scheduler is, Microsoft claims, better able to allocate resources than Vista's, switching threads in flight from virtual to physical cores so those threads can complete more quickly. It's about ensuring the OS understands there's a difference between physical and virtual cores rather than simply seeing a Core i7 as an octo-core chip, as Vista essentially does.
When all the physical cores are busy, then 'parked' virtual cores are given tasks to run. Again, this is about using the fastest resources first in order to complete work more quickly and then power down cores to preserve the battery charge. It should also make the system more responsive.
Trying to hold processing resources in reserve this way means that they're ready to be called upon when a peak in demand occurs, ensuring that the user interface doesn't freeze when other tasks are grabbing lots of CPU cycles. Of course there are instances when the load patterns means this is going to happen anyway, but smarter scheduling and the extra headroom provided by TurboBoost should minimise the frequency of such moments.
Again, only real-world tests will show how valid the claims made by Intel and Microsoft are - and whether they make a difference when so much system power is drawn by, say, the display, a device that's consuming Watts whether the CPU's on a light load or a heavy one.
Snow Leopard's Grand Central Dispatch system, which allows software developers to stop worrying about threads at all and leave it all to the OS has the potential to allow Mac OS X greater control over thread scheduling, but it's not clear from Apple's GCD documentation whether its scheduler is as HT savvy as Windows 7's is.
Only four threads to schedule? Thread Parking ensures they'll only be assigned to physical cores
The Nehalem-derived Xeon 5500 processors Apple builds into the Mac Pro and Xserve have HyperThreading, though the CPUs in none of its other machines yet do, so this feature shouldn't be alien to Apple.
Linux certainly does support HyperThreading and has for years. It has also supported thread parking for quite a while too, kernel scheduler hacker Ingo Molnar told Register Hardware.
Indeed, the next major kernel release, 2.6.32, due in December, will include "further tweaks" for SMT load-balancing, he said, allowing the scheduler to "adapt to the momentary performance profile of each socket, core (and thread) on the system - even if they are asymmetric".
In short, it'll not simply favour real cores over virtual ones but also faster-running physical cores over slower ones, monitoring the state of each and switching threads as each core's frequency changes. The feature will be built into 2.6.32 but disabled by default. ®