Feeds

Deep inside AMD's master plan to topple Intel

Back to the top on a radical GPU

Choosing a cloud hosting partner with confidence

The heterogeneous future

GNC's goal is twofold: simplify the programming model and make the GPU core more capable of participating in what AMD, ARM, Microsoft and others call "heterogenous computing" – that is, distributing work among CPU, GPU, and more-specialized cores, which each element contributing what it does best.

The major change in the GCN's shader array is that it includes what AMD calls the compute unit (CU), and what Demers calls the "cellular basis" of the design. A CU takes over the chores of the previous architecture's VLIW-based SIMD (single-instruction-stream, multiple-data-stream) elements.

VLIW is gone. The GCN's CUs are fundamentally vector cores containing multiple SIMD structures, programmed in a per-lane basis. Four groups of wavefronts are run in each CU core per cycle. "It's a vector core where each lane is programmed independently, and there's a single stream coming in and broadcast all over those things," Demers says. "You program it in a scalar way, and it operates in a vector mode."

Simply put, a CU might be considered to be a smart VLIW/SIMD structure. In the VLIW world, you'd have to rely on the compiler to load the core correctly and efficiently. If something changes in the instruction stream, the VLIW is too dumb to modify its workload, and pipes might remain unfilled with data, wasting cycles.

As you might guess, that makes VLIW perfectly fine for graphics, where predictability is high, but crappy for compute, where dependencies can and do change at a moment's notice – even if that "moment" is a billionth of a second. Although the CU must work wavefront by wavefront – it's not an out-of-order mind-reader – it can move workloads around radically more nimbly than VLIW.

Core reasoning

This versatility is the – pardon the pun – core reason for the GCN: AMD is planning for a heterogeneous world, in which GPUs are increasingly equal compute partners with CPUs.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Is the GCN and its CUs a MIMD, SIMD, or SMT architecture? Yes (click to enlarge)

The CUs can work in virtual space, Demers says, and they'll support the x86 64-bit virtual address space – more on that later. Also, the CUs are supported by a much larger L1 data cache than was in the previous architecture. The cache also has what Demers calls "a significant amount of bandwidth," and is supported by its own control system.

Previous AMD GPU architectures have had what the company has called "hidden fixed-function with hidden state". As examples of such fixed functions, Demers identifies "program counter advancements, and things such as that – limited functionality."

Help with the housekeeping

The GCN moves beyond hidden fixed functions with the addition of a fully observable scalar processor, which frees the CUs from simple tasks – quick math functions, for example, and housekeeping. "It's a processor in its own right," says Demers, and it's responsible for such common code as branching code and common pointers. A vector unit could also handle such common-code chores, but as Demers explains: "The scalar coprocessor helps it out, and offloads those capabilities."

Observability of the CUs and the scalar processor, and support for the x86 virtual space – along with the fact that, Demers says, "you can load the PC from memory or from a register and do all kinds of math" – opens up such C++ features as virtual functions, recursions, and x86 dynamic linked libraries. "All of these become a native thing that this guy can support," he says.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Shrinking processes enable more stuff to be stuffed on a chip – so let's add a scalar processor (click to enlarge)

The processing capability boosted by a host of compute units is all well and good, but only if they can be fed the right data to munch on at the right time. To this end, the GCN architecture allows for multiple command streams from multiple applications, each with different priorities and the ability to reserve CUs for themselves.

As an example of this capability, Demers suggests the interaction of your operating system's user interface and an app. "You can have your GUI running at one priority level, and you can set that high, and you can guarantee some amount of compute units always available for it. But then your big background applications for transcode can be running at a lower priority," he says, and you will still have a great quality of service [QoS] – there's no more skipping mouse when you do a big job, because the big job is running in a separate queue."

Website security in corporate America

More from The Register

next story
Bono: Apple will sort out monetising music where the labels failed
Remastered so hard it would be difficult or impossible to master it again
Oi, Tim Cook. Apple Watch. I DARE you to tell me, IN PERSON, that it's secure
State attorney demands Apple CEO bows the knee to him
4K-ing excellent TV is on its way ... in its own sweet time, natch
For decades Hollywood actually binned its 4K files. Doh!
Your chance to WIN the WORLD'S ONLY HANDHELD ZX SPECTRUM
Reg staff not allowed to enter, god dammit
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Monitors monitor's monitoring finds touch screens have 0.4% market share
Not four. Point four. Count yer booty again, Microsoft
Getting to the BOTTOM of the great office seating debate
Belay that toil, me hearty, and park your scurvy backside
Hey, Mac fanbois. HGST wants you drooling over its HUGE desktop RACK
What vast digital media repository could possibly need 64 TERABYTES?
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.