Feeds

Deep inside AMD's master plan to topple Intel

Back to the top on a radical GPU

High performance access to file storage

The heterogeneous future

GNC's goal is twofold: simplify the programming model and make the GPU core more capable of participating in what AMD, ARM, Microsoft and others call "heterogenous computing" – that is, distributing work among CPU, GPU, and more-specialized cores, which each element contributing what it does best.

The major change in the GCN's shader array is that it includes what AMD calls the compute unit (CU), and what Demers calls the "cellular basis" of the design. A CU takes over the chores of the previous architecture's VLIW-based SIMD (single-instruction-stream, multiple-data-stream) elements.

VLIW is gone. The GCN's CUs are fundamentally vector cores containing multiple SIMD structures, programmed in a per-lane basis. Four groups of wavefronts are run in each CU core per cycle. "It's a vector core where each lane is programmed independently, and there's a single stream coming in and broadcast all over those things," Demers says. "You program it in a scalar way, and it operates in a vector mode."

Simply put, a CU might be considered to be a smart VLIW/SIMD structure. In the VLIW world, you'd have to rely on the compiler to load the core correctly and efficiently. If something changes in the instruction stream, the VLIW is too dumb to modify its workload, and pipes might remain unfilled with data, wasting cycles.

As you might guess, that makes VLIW perfectly fine for graphics, where predictability is high, but crappy for compute, where dependencies can and do change at a moment's notice – even if that "moment" is a billionth of a second. Although the CU must work wavefront by wavefront – it's not an out-of-order mind-reader – it can move workloads around radically more nimbly than VLIW.

Core reasoning

This versatility is the – pardon the pun – core reason for the GCN: AMD is planning for a heterogeneous world, in which GPUs are increasingly equal compute partners with CPUs.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Is the GCN and its CUs a MIMD, SIMD, or SMT architecture? Yes (click to enlarge)

The CUs can work in virtual space, Demers says, and they'll support the x86 64-bit virtual address space – more on that later. Also, the CUs are supported by a much larger L1 data cache than was in the previous architecture. The cache also has what Demers calls "a significant amount of bandwidth," and is supported by its own control system.

Previous AMD GPU architectures have had what the company has called "hidden fixed-function with hidden state". As examples of such fixed functions, Demers identifies "program counter advancements, and things such as that – limited functionality."

Help with the housekeeping

The GCN moves beyond hidden fixed functions with the addition of a fully observable scalar processor, which frees the CUs from simple tasks – quick math functions, for example, and housekeeping. "It's a processor in its own right," says Demers, and it's responsible for such common code as branching code and common pointers. A vector unit could also handle such common-code chores, but as Demers explains: "The scalar coprocessor helps it out, and offloads those capabilities."

Observability of the CUs and the scalar processor, and support for the x86 virtual space – along with the fact that, Demers says, "you can load the PC from memory or from a register and do all kinds of math" – opens up such C++ features as virtual functions, recursions, and x86 dynamic linked libraries. "All of these become a native thing that this guy can support," he says.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Shrinking processes enable more stuff to be stuffed on a chip – so let's add a scalar processor (click to enlarge)

The processing capability boosted by a host of compute units is all well and good, but only if they can be fed the right data to munch on at the right time. To this end, the GCN architecture allows for multiple command streams from multiple applications, each with different priorities and the ability to reserve CUs for themselves.

As an example of this capability, Demers suggests the interaction of your operating system's user interface and an app. "You can have your GUI running at one priority level, and you can set that high, and you can guarantee some amount of compute units always available for it. But then your big background applications for transcode can be running at a lower priority," he says, and you will still have a great quality of service [QoS] – there's no more skipping mouse when you do a big job, because the big job is running in a separate queue."

High performance access to file storage

More from The Register

next story
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Feast your PUNY eyes on highest resolution phone display EVER
Too much pixel dust for your strained eyeballs to handle
Report: Apple seeking to raise iPhone 6 price by a HUNDRED BUCKS
'Well, that 5c experiment didn't go so well – let's try the other direction'
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Nvidia gamers hit trifecta with driver, optimizer, and mobile upgrades
Li'l Shield moves up to Android 4.4.2 KitKat, GameStream comes to notebooks
AMD unveils Godzilla's graphics card – 'the world's fastest, period'
The Radeon R9 295X2: Water-cooled, 5,632 stream processors, 11.5TFLOPS
Sony battery recall as VAIO goes out with a bang, not a whimper
The perils of having Panasonic as a partner
NORKS' own smartmobe pegged as Chinese landfill Android
Fake kit in the hermit kingdom? That's just Kim Jong-un-believable!
Gimme a high S5: Samsung Galaxy S5 puts substance over style
Biometrics and kid-friendly mode in back-to-basics blockbuster
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.