Feeds

AMD's 'Revolution' will be televised ... if its CPU-GPU frankenchip Kaveri is a hit

Graphics and x86 cores are all just 'compute units' now for our games, videos and apps

Secure remote control for conventional and virtual desktops

On the die, memory matters

For one thing, Macri told The Reg in a sit-down after the group briefing, AMD has doubled the size of the branch target buffer in Kaveri over that of its predecessor, Richland.

"It also has one new cute feature in that when it misses in the [instruction] cache now, we kick off a prefetch," he said. "And prefetches are neat in that if they're 100 per cent accurate, then it's super-cool. Getting a memory reference out early helps the memory controller slide it into a place that's idle."

Improving the prefetching of instructions from RAM helps further alleviate the pain of an instruction cache miss: this happens when a core attempts to touch program code outside the cache, forcing it to stall while waiting for the memory controller to copy fresh instructions into the cache. However, by being a bit smarter, "when you shoot a prefetch out, it's like, 'hey, you can take a little longer'," he said.

Such tweaks have allowed Kaveri's designers to boost the CPU's overall instructions-per-clock (IPC) performance by helping out the memory subsystem – which, by the way, has a clock rate of 2400MHz, up from Richland's 2133MHz.

And speaking of memory, Kaveri has two 64-bit, fully independent memory channels. "We do stripe across them," Macri told us, "especially for the memory that's allocated for high-bandwidth needs like graphics."

Kaveri – Graphics Core Next

AMD's Graphics Core Next finally makes it onto a processor die along with CPU cores (click to enlarge)

Compared to discrete GPUs, a 128-bit-wide memory bus might seem – well, does seem – a bit paltry when compared with AMD's most powerful discrete GPU, which has a 512-bit bus. But as Macri points out in defense of the narrower path, Kaveri has just eight GPU cores to feed, whereas the hefty discrete-memory GPUs have more.

"We are a little light on memory bandwidth for graphics," he said, "but we're perfect, I think, on the compute side – or very close to being very well balanced on the compute side."

There are other reasons for the narrower memory bus, not the least being cost – both in terms of package cost and die real estate. A 64-bit memory channel uses about 118 pins for data, address, control, and clocks, he told us, and 0.8mm2 per byte is a good rule of thumb for additional die size. So if you wanted to add another 64-bit memory channel, you'd need to add about 6.5 to 7 mm2 of die real estate and more than 100 extra pins to the package, driving up both cost and size.

"Things just start adding up so that you can't afford it," he said. "And then in small-form factors, there's only enough room in here to have a 128-bit memory bus. And we really optimized Kaveri around ensuring that it can go into a small-form factor all the way up to the desktop. If we were only designing for desktop, I would have probably added another memory channel."

Macri told us that Kaveri's designers did "as much as possible to utilize that memory bandwidth as well as possible." For example, at the back end of the graphics pipes there's a local data share [PDF] between the different stages to reduce the need for having to go off-chip in search of data. In addition, he said, "GCN has a nice array of on-die buffers: L1 caches, L2 caches, local data stores. These all help."

Choosing a cloud hosting partner with confidence

More from The Register

next story
4K-ing excellent TV is on its way ... in its own sweet time, natch
For decades Hollywood actually binned its 4K files. Doh!
Oi, Tim Cook. Apple Watch. I DARE you to tell me, IN PERSON, that it's secure
State attorney demands Apple CEO bows the knee to him
Apple's big bang: iPhone 6, ANOTHER iPhone 6 Plus and WATCH OUT
Let's >sigh< see what Cupertino has been up to for the past year
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Get your Indian Landfill Android One handsets - they're only SIXTY QUID
Cheap and deafening mobes for the subcontinental masses
Apple's SNEAKY plan: COPY ANDROID. Hello iPhone 6, Watch
Sizes, prices and all – but not for the wrist-o-puter
A SCORCHIO fatboy SSD: Samsung SSD850 PRO 3D V-NAND
4Gb/s speeds on a consumer drive, anyone?
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.