Deep, deep dive inside Intel's next-generation processor
Join us on a whirlwind Haswell holiday – non-geeks heartily welcomed
At Intel's developer shindig last week, chippery engineers spent a goodly amount of time conducting tech sessions that detailed the company's upcoming 4th-generation Core microprocessor architecture, code-named "Haswell."
We thought that you, inordinately intelligent and tech-savvy Reg reader, might enjoy a deep dive into their handiwork.
The new Haswell microarchitecture – likely named after the tiny Colorado town and not the Australian red-groined froglet – was touted by Intel's Architecture Group headman David Perlmutter as being "designed with mobility in mind." In pursuit of that goal, he said that Haswell will require just one-twentieth of the idle power – that's full platform power, not just CPU power – of the second-generation core processors, code-named Sandy Bridge.
Perlmutter emphasized Haswell's future appearance in "sleek tablets" and Ultrabooks, followed "eventually" by desktops and workstations. In the more-technical sessions that followed Perlmutter's rather fluffy keynote, however, Intel engineers added data centers to Haswell's future turf.
During those sessions, a wealth of Haswell details were shared, explaining how Intel is counting on 22-nanometer Haswell chips to be faster, more power-miserly, and more media-friendly than their predecessors – and to finally move Intel into the tablet and handset market that continues to elude it.
Just like Ivy Bridge, except when it isn't
The design of Haswell's compute cores – two to four of them when it first appears in its client incarnations late next year – is an evolutionary, not revolutionary step beyond those in today's 22-nanometer, 3rd-generation Ivy Bridge processors.
"The starting point is what came before it with Sandy Bridge and Ivy Bridge," Intel engineer Ronak Singhal told attendees at one Haswell session, citing Turbo Boost technology, hyperthreading, integrated graphics on the same die as the compute cores, the ring interconnect between the various elements such as the computer and graphics cores, and the shared cache between those two core types.
"All of those are features that are carried over into the Haswell generation," he said. In many cases, however, those technologies had been tweaked – more on those in a moment.
As The Reg told you last week, one of Haswell's key features is that its high degree of modularity will allow it to be used on a broader range of processors than Intel has attempted before in any of its previous microarchitectures – one core to rule them all, as it were.
Different usage models, of course, will require different power levels. How much juice you can feed a Haswell-based chip will be one factor determining how high its performance will be. A higher-performing chip will require more power – as if you didn't already know that.
But determining a processor's power usage is not simply a matter of deciding how much juice to feed it – there's a lot of in-chip dynamic power-management going on, and the Haswell engineers focused intently on tweaking the architecture's capabilities in that regard.
In addition to those tweaks, Haswell carries over a number of power-management features of Sandy Bridge and Ivy Bridge. Those chips have essentially two classes of power states: active and sleep. Haswell provides both of those two states, as well, but adds a new state that Intel calls "active idle".
Those blue circles hiding behind Haswell's green power states belong to Mr. and Ms. Sandy and Ivy Bridge (click to enlarge)
In this state, as explained by Intel Fellow Per Hammarlund, "The OS and the software on top of the hardware thinks that the hardware is active ... but in reality we're achieving power levels that are associated with the previous idle state."
This new active idle state is what enables Haswell to achieve the 20X improvement Perlmutter referred to in his keynote. "This is really what enables the key benefit in battery life for Haswell," Hammarlund said, noting that state changes can occur in milliseconds, or at most hundreds of milliseconds – swift, indeed.
If you're a developer, fear not about re-coding your apps to take advantage of this new capability. According to Hammerlund, it's all handled in hardware combined with firmware, and it will all be done for your app automagically and continuously.
He did offer the caveat that said magic will work only for "well-written software", but noted that "The key here is that most software is actually fairly decently written and will take advantage of these power modes ... and you will get these 20X idle power improvements for free."
In addition, Intel has improved power management by lowering the power required by both Haswell's active and sleep states, and by improving the transition time from power to sleep. The transition time from active idle to sleep, the company says, is also quite snappy.
Remember megahertz and gigahertz marketing? In today's mobile world it's all about power management (click to enlarge)
The active idle state is not the only new hotness in Haswell's power management. The temporary clock-speed boosting Turbo Mode has been tweaked to be more power-efficient, for example, as well as being extended upwards into higher gigahertz levels for more performance headroom.
One way the extra voltage has been freed to give Turbo Mode more boost is by decoupling the voltage and frequency of various elements on the die from one another to allow for more fine-grained power control. This provides the ability to better shift power from where it's not needed to where it is.
Other tweaks include finer-grained control of which parts of the die are on and off at any one time. "In reality, it's mostly about making sure everything is off all the time," Hammarlund said. "If you don't need it, it's off. That's the philosophy."
True chip geeks will be happy to know that Haswell has additional and deeper C-states – power modes – and that the transition times between C-states have been improved by as much as 25 per cent. Not a true chip geek, sir or madam? Don't worry about it – all this means is that your Haswell-equipped mobile device's battery will likely last longer.
Next page: Compute-core competencies
Software decently written?
"The key here is that most software is actually fairly decently written"
He must be joking...
Pentium 4 was deemed as having very poor performance because to take advantage of it, software needed to be "fairly decently written", and compiled with a decent compiler. The problem is that to date there is only one compiler worth a damn for x86 - Intel's own (ICC).
I did some performance testing a while back:
Clock-for-clock, with crap compilers (GCC, PGCC) Pentium 4 is about 40% slower than Pentium 3. But with ICC, Pentium 4's performance actually goes up by 20%, clock-for-clock, compared to a Pentium 3.
It's not just down to software being decently written (which it isn't a lot of the time) - it's also down to the compiler doing a decent job (which most don't). On one hand, one could argue something along the lines of: "Pentium 4 didn't suck - you were merely too stupid to use it properly." Unfortunately, this is way, way beyond the average consumer to either understand or do anything about and it is the consumer's perception that decides whether a product is going to be a success or failure.
Wait ... what?
"Intel CEO Paul Otellini has called "the third pillar of computing," security – the other two pillars being energy efficiency and internet connectivity."
This twat is a marketard, not an engineer ... he wouldn't know the difference between ones & zeros if he got 'em under his carefully manicured fingernails. He's part of the (current) reason Intel is heading for the bit-bucket.
By way of reference, the real three pillars of computing are memory, IO, and CPU ...
"We thought that you, inordinately intelligent and tech-savvy Reg reader, might enjoy a deep dive into their handiwork."
Have you read all the comments on the register? :P
Re: that hidden message
Darker-skinned folks faces are OK to start with.
Re: AVX2 on integers
The AVX2 (long long) integer operations will be in there for cryptographic processing.
The really interesting bit is the transactional memory TSX extensions (IBM’s is already well along the curve). TSX should be a big kicker for TP and HPC, but writing software to take advantage of it is going to take a big paradigm shift away from Garbage-collection to Bedouin memory management