Deep, deep dive inside Intel's next-generation processor

Join us on a whirlwind Haswell holiday – non-geeks heartily welcomed

  • alert
  • submit to reddit

High performance access to file storage

Graphics and media: much better this time – we promise

The performance of Intel's integrated, on-chip graphics has never been what one might call stellar. The company's graphics guru, Senior Fellow Tom Piazza, says that Haswell will not only provide significant improvements over the performance of the Sandy Bridge and Ivy Bridge graphics cores, but further improvements will appear in future generations as well.

Piazza reminded his IDF audience that in 2008 the company's goal was to get a ten-fold improvement in integrated graphics performance by 2010 – a goal dubbed 10X by 10, and one that Intel said it had achieved with the release of Sandy Bridge.

"I remember being here a few years ago with the 10X by 10 in Sandy Bridge, and I remember saying 'This is just the beginning'," he said. "I'll sit here and tell you today that Haswell is not the end. We're going to keep on going from here."

Intel's goal with the Haswell graphics architecture was – as is true with the compute cores as well – to improve performance without sacrificing power efficiency.

Like the compute cores, Haswell's graphics draw heavily from the graphics cores in the Sandy Bridge and Ivy Bridge architecture – a "similar microarchitecture with some embellishments on it," as Piazza put it. Some of those embellishments are designed to help performance – significantly so in media handling – hand-in-hand with power savings. The graphics subsystem also falls in line with the compute cores in that it focuses on both modularity and scalability.

Slide from Intel Developers Forum 2012 providing details of Intel's 4th Generation Core Processor, codenamed 'Haswell'

Ah, if graphics and media engineering were only as simple as drawing pretty pink and blue boxes (click to enlarge)

As an example of the design's scalability, Haswell graphics will be offered in three flavors: GT1, GT2, and GT3 (with the same driver stack), with the highest level doubling many of the sub-elements of the graphics core, but doing so without increasing its power requirements in some scenarios, Piazza said.

One trick that Haswell uses to accomplish this is to decouple the clocks of the compute cores, the graphics core, and the ring bus that connects them – they were locked together in Sandy Bridge and Ivy Bridge. In those two architectures, if you wanted to "turbo up" the graphics core and the ring bus when extra performance was desired, you had to raise the voltage and clock speed of the compute cores as well – a power waster.

In Haswell, Piazza explained, "The ring is totally isolated – a separate domain – and we can move the ring up and down, the graphics up and down, and the CPU up and down independently."

If you're running a Haswell part with GT3 graphics, for example, you have so much additional graphics performance that you can run the graphics innards at lower power, raise the ring bus just a bit to take advantage of all of GT3's bandwidth, but drop the compute cores' power down to a lower voltage state to save juice.

Slide from Intel Developers Forum 2012 providing details of Intel's 4th Generation Core Processor, codenamed 'Haswell'

There are a host of activities going on in Haswell's scalable graphics core, even in its most basic form (click to enlarge)

The basic building blocks of the Haswell processor, as mentioned above, are similar to those in the two previous generation, but with the addition of what Piazza called "some goodies in there for new features." There is, for example, a new resource streamer that handles in hardware many of the buffer-management tasks that otherwise would be handled by driver software.

The GT3 implementation is particularly interesting – look for it in higher-end client Haswell chips when they start shipping, likely late next year or early in 2014. Compared with GT2, GT3 has twice the amount of raw pixel and shader performance clock-for-clock, more vertex throughput, twice the rasterization and pixel backend, and twice the Level 3 cache – two interleaved Level 3 caches, to be exact – to feed the beast with about half a terabyte of bandwidth within the chip.

Piazza would not be drawn out, however, on how many execution units (EUs) are in each of the three GT implementations. When asked how many general-purpose computing on GPU (GPGPU) single-precision flops the Haswell chips would be able to achieve, he demurred. "Now you're asking me to tell you how many shader units, etcetera," he said, "and I'm not going to disclose that."

He did, however, offer a hint. "Since we're not saying how many shaders, each of our shader units can do eight FMAs [fused multiply-adds – remember them from the compute core?] per flop."

Slide from Intel Developers Forum 2012 providing details of Intel's 4th Generation Core Processor, codenamed 'Haswell'

Haswell's GT3 is like Wrigley's gum: double your pleasure, double your pixels, shaders, bandwidth... (click to enlarge)

In addition to these graphics tune-ups, Haswell also features a raft of media-processing improvements, including video-codec expansions and video- and image-processing enhancements, with – of course – power management and optimizations thrown in for good measure.

Haswell, for example, adds decoding for SVC – scalable video coding, an extension to the H.264 codec standard – to the AVC, VC1, and MPEG2 support that's in the Ivy Bridge. SVC, said Intel Fellow Hong Jiang, is a "key enabler" for multi-participant video conferencing and streaming-media servers.

Also aboard Haswell are a motion JPEG (MJPEG) decoder for low-power USB-webcam video conferencing, and an MPEG2 hardware encoder useful for DVD creation and DLNA (Digital Living Network Alliance) digital media streaming – think music, photos, and video for home entertainment use.

Media enhancements include a dedicated Video Quality Engine (VQE) that adds color-gamut expansion – which Hong said maintains color saturation and improves visual quality — plus tunable skin-tone image enhancement, frame-rate conversion, and image stabilization to the suite of video-quality enhancements already in Ivy Bridge. Those include such niceties as de-noise and de-interlace circuitry, as well as adaptive contrast enhancement and more.

Slide from Intel Developers Forum 2012 providing details of Intel's 4th Generation Core Processor, codenamed 'Haswell'

Media improvements hew to Intel's marketing push: it's all about 'the experience' (click to enlarge)

As with the graphics core, the GT1, GT2, and GT3 versions have a rising range of media-handling capabilities, with the GT3 version having two times the media-sampling and VQE throughput as does GT2.

Haswell also adds support for 4K video – which, depending upon the industry 4K standard you prefer, refers to video resolutions of 4096-by-2304, 4096-by-2160, or 3840-by-2160 pixels. That last one, by the way, is also known as QFHD (quad full HD), and was recently adopted by Sony for its 4K video.

Hong included in his presentation a slide that showed the powers of Haswell's tunable skin-tone processing, which works its image alchemy by examining a pixel and its neighbors, calculating the likelihood that the pixel is displaying a skin color, then adjusting the the pixel and its neighbors to display the amount of detail desired.

Slide from Intel Developers Forum 2012 providing details of Intel's 4th Generation Core Processor, codenamed 'Haswell'

Haswell's skin-tone enhancements may seem subtle, but look closer – they're not (click to enlarge)

With this capability – which Intel in its engineering geekiness calls the Skin Tone Tuned Image Enhancement Filter – you can merely enhance the skin tones in your subject's image, crank them up with higher contrast, or smooth them out – great news for those aging subjects who might want their wrinkles de-emphasized.

Such candidates for facial de-emphasization would most certainly include your aging Reg reporter, who after days of sitting in darkened IDF technical sessions packed with high-level chipheads emerged feeling rather wrinkly, indeed – with said wrinkles appearing not only on his tired visage, but also in his equally exhausted brain. ®

High performance access to file storage

More from The Register

next story
Feast your PUNY eyes on highest resolution phone display EVER
Too much pixel dust for your strained eyeballs to handle
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Report: Apple seeking to raise iPhone 6 price by a HUNDRED BUCKS
'Well, that 5c experiment didn't go so well – let's try the other direction'
Rounded corners? Pah! Amazon's '3D phone has eye-tracking tech'
Now THAT'S what we call a proper new feature
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Nvidia gamers hit trifecta with driver, optimizer, and mobile upgrades
Li'l Shield moves up to Android 4.4.2 KitKat, GameStream comes to notebooks
AMD unveils Godzilla's graphics card – 'the world's fastest, period'
The Radeon R9 295X2: Water-cooled, 5,632 stream processors, 11.5TFLOPS
Sony battery recall as VAIO goes out with a bang, not a whimper
The perils of having Panasonic as a partner
NORKS' own smartmobe pegged as Chinese landfill Android
Fake kit in the hermit kingdom? That's just Kim Jong-un-believable!
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.