The heterogeneous future
GNC's goal is twofold: simplify the programming model and make the GPU core more capable of participating in what AMD, ARM, Microsoft and others call "heterogenous computing" – that is, distributing work among CPU, GPU, and more-specialized cores, which each element contributing what it does best.
The major change in the GCN's shader array is that it includes what AMD calls the compute unit (CU), and what Demers calls the "cellular basis" of the design. A CU takes over the chores of the previous architecture's VLIW-based SIMD (single-instruction-stream, multiple-data-stream) elements.
VLIW is gone. The GCN's CUs are fundamentally vector cores containing multiple SIMD structures, programmed in a per-lane basis. Four groups of wavefronts are run in each CU core per cycle. "It's a vector core where each lane is programmed independently, and there's a single stream coming in and broadcast all over those things," Demers says. "You program it in a scalar way, and it operates in a vector mode."
Simply put, a CU might be considered to be a smart VLIW/SIMD structure. In the VLIW world, you'd have to rely on the compiler to load the core correctly and efficiently. If something changes in the instruction stream, the VLIW is too dumb to modify its workload, and pipes might remain unfilled with data, wasting cycles.
As you might guess, that makes VLIW perfectly fine for graphics, where predictability is high, but crappy for compute, where dependencies can and do change at a moment's notice – even if that "moment" is a billionth of a second. Although the CU must work wavefront by wavefront – it's not an out-of-order mind-reader – it can move workloads around radically more nimbly than VLIW.
This versatility is the – pardon the pun – core reason for the GCN: AMD is planning for a heterogeneous world, in which GPUs are increasingly equal compute partners with CPUs.
The CUs can work in virtual space, Demers says, and they'll support the x86 64-bit virtual address space – more on that later. Also, the CUs are supported by a much larger L1 data cache than was in the previous architecture. The cache also has what Demers calls "a significant amount of bandwidth," and is supported by its own control system.
Previous AMD GPU architectures have had what the company has called "hidden fixed-function with hidden state". As examples of such fixed functions, Demers identifies "program counter advancements, and things such as that – limited functionality."
Help with the housekeeping
The GCN moves beyond hidden fixed functions with the addition of a fully observable scalar processor, which frees the CUs from simple tasks – quick math functions, for example, and housekeeping. "It's a processor in its own right," says Demers, and it's responsible for such common code as branching code and common pointers. A vector unit could also handle such common-code chores, but as Demers explains: "The scalar coprocessor helps it out, and offloads those capabilities."
Observability of the CUs and the scalar processor, and support for the x86 virtual space – along with the fact that, Demers says, "you can load the PC from memory or from a register and do all kinds of math" – opens up such C++ features as virtual functions, recursions, and x86 dynamic linked libraries. "All of these become a native thing that this guy can support," he says.
Shrinking processes enable more stuff to be stuffed on a chip – so let's add a scalar processor (click to enlarge)
The processing capability boosted by a host of compute units is all well and good, but only if they can be fed the right data to munch on at the right time. To this end, the GCN architecture allows for multiple command streams from multiple applications, each with different priorities and the ability to reserve CUs for themselves.
As an example of this capability, Demers suggests the interaction of your operating system's user interface and an app. "You can have your GUI running at one priority level, and you can set that high, and you can guarantee some amount of compute units always available for it. But then your big background applications for transcode can be running at a lower priority," he says, and you will still have a great quality of service [QoS] – there's no more skipping mouse when you do a big job, because the big job is running in a separate queue."
Next page: Thanks for the (shared) memory
Thats always the odd thing.
AMD moans about Intel but Intel does one thing that AMD never does...Advertise.
So whan folks go to buy a PC they get a choice of Intel or AMD.
Well they've never heard of AMD but they get to hear the Intel jingle at least three times a day on TV so they buy from the one brand they have heard of.
AMD needs to sack their marketing team, get a new one and start spending on some jingles etc. after all there is only one other company doing it in their field so how hard can it be?
Even Acer has adverts in the UK.
As for no-one wanting AMD, I build my budget PC boxes with AMD CPUs in them. Why? The saving of using AMD enables me to put a 60GB SSD in the box. Customers dont notice the difference between an Athlon II or a i5 but they notice the SSD. They also get USB3.0/HDMI/usable integrated graphics, which they dont get on the budget Intel motherboards.
You, sir, are of the highest intelligence because you happen to have, with the unusual sharpness of a Moriarty-like mind, recognized, in a flash as it were, what capitalism is all about.
Except for the "all but irrelevant", "things that nobody wants" and "simply don't perform" parts.
Have a cookie. It's a bit mouldy.
they revolutionised CPUs. i remember how great my first Athlon was compared to the intel chips of the time, and cheap too!
This comment made me lol!!!
We should all thank AMD.......
for without AMD Intel would still be selling 500mhz Pentium 5 at $1000.00 per unit.
AMD is far from irrelevant. Now VIA is irrelevant.
AMD is taking RADEON and changing the core of computing.
Intel has Sandy Bridge which is a core duo glued to crummy graphics.
Intel has no graphics capability. RADEON was an expensive purchase but just maybe they knew what they were doing when AMD overpaid for it.