Deep inside AMD's master plan to topple Intel

Back to the top on a radical GPU

Choosing a cloud hosting partner with confidence

Thanks for the (shared) memory

To feed all those command streams, Demers says, a new memory system is needed. In previous AMD GPU architectures, the memory system was a read-only cache; in the new architecture, it's read-write. "It's a generalized cache just like we have in CPUs," he says.

Total bandwidth between the CUs and the caches is, of course, dependent upon the number of CUs and the clock speed. Assuming a clock of around one gigahertz, "If you think of a CU as the equivalent of a SIMD – which isn't the case, but today we ship with 24 of these – 24 CUs would be one and a half terabytes of bandwidth to their L1 caches," Demers says. "Pretty good numbers."

Don't expect AMD to stick to 24-CU implementations, however. Demers talked of future designs with over a hundred CUs – and it's not tough to do the math to figure out what the total cache bandwidth would be in such chips: 100 CUs would top 6 terabytes of total bandwidth.

To add more memory-system versatility, there's a full interconnect between the L2 and L2 caches. "The L2s are more physically based. They match your memory," Demers explained. "They're also where all the coherency happens – and that's what I mean by the physical binding of the L2s."

The L1s get their data from their associated L2s, but the L2s – since they're the soul of coherency – will communicate with one another. The GCN also envisions conherency being handled between both CPU and GPU at the L2 level. "I'm talking probe traffic," Demers says, "I'm talking all the usual stuff you've come to expect on coherency."

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

GPU CUs and CPU cores will find coherency at the L2 level. Discrete GPUs can join over PCIe (click to enlarge)

With all the CUs having access to all the data that's in the L2 farm, time-consuming trips back and forth to and from far-off system memory would be minimized, pruning latency. Discrete GPUs will also join in the coherency mix, with all traffic being tunneled over PCIe. "Discrete GPUs and Fusion APUs will all use the same core technology," Demers explains.

x86 spoken here

x86 support, he says, means that "our GPUs have to have address-translation caches. Basically, they take virtual addresses and they translate that into physical addresses." Address-translation caches already exist in AMD GPUs, but in the new architecture, they'll be talking in x86 language.

On the CPU side, "an OS-visible IOMMU [input/output memory-management unit] – just like the CPU has an MMU, which handles which handles physical to virtual translation on the CPU – needs to exist," Demers says.

With an IOMMU – which will be part of both AMD's discrete CPUs and APUs – the chips will be able to support address-translation requests. Demers also notes that should their be a page fault, "the GPU will be happy with that – well, not necessarily happy, but it will survive that. It will wait until that page is brought in by the operating system and made local, then – bang! – it'll keep on running."

The x86 address space will provide "all the goodness" that comes from a virtual address space, and will be available for the GPU in the new architecture, Demers said, specifically citing over-subscription. "Our plan is that eventually all these devices – whether CPUs or GPUs – are in the same unified 64-bit address space."

As might be assumed due to Demers' page-fault example, OS support will be required for IOMMUs, just like it is on MMUs, so AMD is now working with operating-system designers. Although he didn't specifically say which ones, Microsoft's presence at AMD's event might well be counted as a major hint.

All these features will stretch across AMD's graphics-capable product line. "I'm not talking about an APU, I'm not talking about a GPU, I'm talking about an IP of a core that's going to be used in all our products going forward," Demers says. "Over the next few years we're going to be bringing you all of this throughout all of our products that have GPU cores."

Meat and potatoes

Despite spending a raft of development time on this fundamentally different GPU architecture, AMD also spent some time digging into such meat-and-potatoes graphics necessities as good ol' 3D performance.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Heterogeneity is all well and good, but AMD has some 3D improvement in mind, as well (click to enlarge)

"I did say that 3D and compute are starting to merge – and in my mind they already have," Demers says. "Somebody recently asked me about APIs – well, we're full of ideas for graphics. And we still love APIs and we think that developers will continue to use APIs."

He suggests that some developers will want to "go directly to compute," but he said that AMD would continue to work with partners such as Khronos – the OpenCL caretaker – and DX11-provider Microsoft to expose to devs more features that AMD provides in its hardware.

As an example of something that the new architecture will support, Demers offers partially resident textures (PRTs), which he defined as the ability to "tell an application: 'Look, create textures of any size you want, and then bring in the parts that you need when you want them'."

2013 and beyond

All this new stuff doesn't mean that all the old stuff has been jettisoned. Fixed-function elements such as Raster Ops and Z units, for example, are still there with their own caches. "We don't want to get rid of any things that are good in our core," Demers says. "We're going to continue to drive [fixed-function features] forward and continue to put more of those units [on chip] as cost and process allow us.

The read/write cache in the GCN will also be available as a texture cache. "Larger caches, higher throughputs – those are going to benefit texturing as well," Demers says. In addition, true virtual memory will enable such niftiness as being able to pre-compute massive scenes and load portions only as needed, smoothing performance.

"I really am excited about Fusion System Architecture (FSA) and 3D merging," Demers says – excitably, as one might imagine. "Compute, and graphics APIs, and hybrids of all those things – it's really cool."

Unfortunately, you'll have to wait a while to experience that coolness. AMD's Bulldozer-based APU, Trinity, which was demoed on the same Fusion Summit stage two days before Demers' presentation, will be VLIW-based when it appears next year. Best-guesstimates put GCN-based APUs somewhere in the 2013 time frame.

With the introduction of FSA and the GCN – oh, and let's not forget the Bulldozer and Bobcat CPU cores – AMD is betting the farm that the future will belong to heterogeneous computing, where tasks are given to various and sundry cores according to the ability, and distributed from apps according to their need.

For AMD's sake, let's hope that if they have seen the future, and that their implementation works better than did the terrestrial analog of that to/from equation. ®

Intelligent flash storage arrays

More from The Register

next story
Xperia Z3: Crikey, Sony – ANOTHER flagship phondleslab?
The Fourth Amendment... and it IS better
Don't wait for that big iPad, order a NEXUS 9 instead, industry little bird says
Google said to debut next big slab, Android L ahead of Apple event
Microsoft to enter the STRUGGLE of the HUMAN WRIST
It's not just a thumb war, it's total digit war
Chipmaker FTDI bricking counterfeit kit
USB-serial imitators whacked by driver update
A drone of one's own: Reg buyers' guide for UAV fanciers
Hardware: Check. Software: Huh? Licence: Licence...?
The Apple launch AS IT HAPPENED: Totally SERIOUS coverage, not for haters
Fandroids, Windows Phone fringe-oids – you wouldn't understand
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
prev story


Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.