Feeds

Deep inside AMD's master plan to topple Intel

Back to the top on a radical GPU

Next gen security for virtualised datacentres

Thanks for the (shared) memory

To feed all those command streams, Demers says, a new memory system is needed. In previous AMD GPU architectures, the memory system was a read-only cache; in the new architecture, it's read-write. "It's a generalized cache just like we have in CPUs," he says.

Total bandwidth between the CUs and the caches is, of course, dependent upon the number of CUs and the clock speed. Assuming a clock of around one gigahertz, "If you think of a CU as the equivalent of a SIMD – which isn't the case, but today we ship with 24 of these – 24 CUs would be one and a half terabytes of bandwidth to their L1 caches," Demers says. "Pretty good numbers."

Don't expect AMD to stick to 24-CU implementations, however. Demers talked of future designs with over a hundred CUs – and it's not tough to do the math to figure out what the total cache bandwidth would be in such chips: 100 CUs would top 6 terabytes of total bandwidth.

To add more memory-system versatility, there's a full interconnect between the L2 and L2 caches. "The L2s are more physically based. They match your memory," Demers explained. "They're also where all the coherency happens – and that's what I mean by the physical binding of the L2s."

The L1s get their data from their associated L2s, but the L2s – since they're the soul of coherency – will communicate with one another. The GCN also envisions conherency being handled between both CPU and GPU at the L2 level. "I'm talking probe traffic," Demers says, "I'm talking all the usual stuff you've come to expect on coherency."

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

GPU CUs and CPU cores will find coherency at the L2 level. Discrete GPUs can join over PCIe (click to enlarge)

With all the CUs having access to all the data that's in the L2 farm, time-consuming trips back and forth to and from far-off system memory would be minimized, pruning latency. Discrete GPUs will also join in the coherency mix, with all traffic being tunneled over PCIe. "Discrete GPUs and Fusion APUs will all use the same core technology," Demers explains.

x86 spoken here

x86 support, he says, means that "our GPUs have to have address-translation caches. Basically, they take virtual addresses and they translate that into physical addresses." Address-translation caches already exist in AMD GPUs, but in the new architecture, they'll be talking in x86 language.

On the CPU side, "an OS-visible IOMMU [input/output memory-management unit] – just like the CPU has an MMU, which handles which handles physical to virtual translation on the CPU – needs to exist," Demers says.

With an IOMMU – which will be part of both AMD's discrete CPUs and APUs – the chips will be able to support address-translation requests. Demers also notes that should their be a page fault, "the GPU will be happy with that – well, not necessarily happy, but it will survive that. It will wait until that page is brought in by the operating system and made local, then – bang! – it'll keep on running."

The x86 address space will provide "all the goodness" that comes from a virtual address space, and will be available for the GPU in the new architecture, Demers said, specifically citing over-subscription. "Our plan is that eventually all these devices – whether CPUs or GPUs – are in the same unified 64-bit address space."

As might be assumed due to Demers' page-fault example, OS support will be required for IOMMUs, just like it is on MMUs, so AMD is now working with operating-system designers. Although he didn't specifically say which ones, Microsoft's presence at AMD's event might well be counted as a major hint.

All these features will stretch across AMD's graphics-capable product line. "I'm not talking about an APU, I'm not talking about a GPU, I'm talking about an IP of a core that's going to be used in all our products going forward," Demers says. "Over the next few years we're going to be bringing you all of this throughout all of our products that have GPU cores."

Meat and potatoes

Despite spending a raft of development time on this fundamentally different GPU architecture, AMD also spent some time digging into such meat-and-potatoes graphics necessities as good ol' 3D performance.

AMD Fusion Summit 2011 keynote presentation slide: 'Evolution of AMD's Graphics Core, and Preview of Graphics Core Next'

Heterogeneity is all well and good, but AMD has some 3D improvement in mind, as well (click to enlarge)

"I did say that 3D and compute are starting to merge – and in my mind they already have," Demers says. "Somebody recently asked me about APIs – well, we're full of ideas for graphics. And we still love APIs and we think that developers will continue to use APIs."

He suggests that some developers will want to "go directly to compute," but he said that AMD would continue to work with partners such as Khronos – the OpenCL caretaker – and DX11-provider Microsoft to expose to devs more features that AMD provides in its hardware.

As an example of something that the new architecture will support, Demers offers partially resident textures (PRTs), which he defined as the ability to "tell an application: 'Look, create textures of any size you want, and then bring in the parts that you need when you want them'."

2013 and beyond

All this new stuff doesn't mean that all the old stuff has been jettisoned. Fixed-function elements such as Raster Ops and Z units, for example, are still there with their own caches. "We don't want to get rid of any things that are good in our core," Demers says. "We're going to continue to drive [fixed-function features] forward and continue to put more of those units [on chip] as cost and process allow us.

The read/write cache in the GCN will also be available as a texture cache. "Larger caches, higher throughputs – those are going to benefit texturing as well," Demers says. In addition, true virtual memory will enable such niftiness as being able to pre-compute massive scenes and load portions only as needed, smoothing performance.

"I really am excited about Fusion System Architecture (FSA) and 3D merging," Demers says – excitably, as one might imagine. "Compute, and graphics APIs, and hybrids of all those things – it's really cool."

Unfortunately, you'll have to wait a while to experience that coolness. AMD's Bulldozer-based APU, Trinity, which was demoed on the same Fusion Summit stage two days before Demers' presentation, will be VLIW-based when it appears next year. Best-guesstimates put GCN-based APUs somewhere in the 2013 time frame.

With the introduction of FSA and the GCN – oh, and let's not forget the Bulldozer and Bobcat CPU cores – AMD is betting the farm that the future will belong to heterogeneous computing, where tasks are given to various and sundry cores according to the ability, and distributed from apps according to their need.

For AMD's sake, let's hope that if they have seen the future, and that their implementation works better than did the terrestrial analog of that to/from equation. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
So, Apple won't sell cheap kit? Prepare the iOS garden wall WRECKING BALL
It can throw the low cost race if it looks to the cloud
Samsung Gear S: Quick, LAUNCH IT – before Apple straps on iWatch
Full specs for wrist-mounted device here ... but who'll buy it?
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Now that's FIRE WIRE: HP recalls 6 MILLION burn-risk laptop cables
Right in the middle of Burning Mains Man week
Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?
Tip: Put the shades on and you'll look less of a spanner
HUGE iPAD? Maybe. HUGE ADVERTS? That's for SURE
Noo! Hand not big enough! Don't look at meee!
AMD unveils 'single purpose' graphics card for PC gamers and NO ONE else
Chip maker claims the Radeon R9 285 is 'best in its class'
Tim Cook in Applerexia fears: New MacBook THINNER THAN EVER
'Supply chain sources' give up the goss on new iLappy
Just in case? Unverified 'supersize me' iPhone 6 pics in sneak leak peek
Is bigger necessarily better for the fruity firm's flagship phone?
prev story

Whitepapers

Gartner critical capabilities for enterprise endpoint backup
Learn why inSync received the highest overall rating from Druva and is the top choice for the mobile workforce.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.