Feeds

AMD to double up cores with Jaguars

And maybe finally a Cat server variant

Secure remote control for conventional and virtual desktops

Hot Chips For those of us hoping that Advanced Micro Devices would get into the low-powered server racket with some earnestness, it looks like the forthcoming processors based on the "Jaguar" cores will fit the bill quite nicely.

The Jaguars are the kickers to the current "Bobcat" family of x86 processors used in ceepie-geepie hybrids bearing the Fusion APU brand, but in years gone by, AMD poo-pooed the idea that a Cat family chip would ever wear server operating spots.

With the Jaguars, this could change, particularly with the significant increase in core count, cache, and main memory capacity that is being put into the design.

Jeff Rupley, chief architect of the Jaguar processors at AMD, didn't give any hints about where the Jaguar chips would be used - or where they would not be - in his presentation at the Hot Chips 24 conference in Cupertino on Monday, or what the target clock frequencies would be for the Jaguars. Rupley was only there to talk about the Jaguar architecture and deferred on such questions.

But what is clear from the specs is that if a Jaguar chip is suitable for cloud-optimized clients and other low-power devices such as tablets, then there is no good reason that a bunch of them could not be ganged up and crammed inside of a dense chassis of microservers to run Windows or Linux workloads that have only modest performance requirements and where server density is much more important. It could turn out that the performance per watt and performance per dollar per watt of a server-ized Jaguar chip beats a low-voltage Opteron 3300 or 4300.

Microarchitecture of the Jaguar core

Microarchitecture of the Jaguar core (click to enlarge)

No matter what, the Jaguar-based processors will have the benefit of moving to new 28 nanometer processes. That's plural, and it means AMD has created the Jaguar design so it can be dual-sourced from either GlobalFoundries or Taiwan Semiconductor Manufacturing Corp, both of whom make desktop and laptop processors for AMD these days. (TSMC also makes AMD's graphics processors, as it does graphics chips from Nvidia.) With the shrink from 40 nanometers to 28 nanometers, AMD is doing a number of things with the Jaguar chips.

First, it is doubling up the core count while making some substantial changes to the cache memory structure relative to that used in the Bobcat-based chips. There are also a number of tweaks to the instruction set to boost the performance per clock cycle (the same thing is expected with the "Piledriver" cores for Opteron server chips later this year) and support for AVX vector math.

The Jaguar design has four cores running along the bottom of the chip, with an L2 cache interface riding on top of the cores that links out to the northbridge of the chipset and to four banks of L2 cache memory with a total of 2MB of cache. That's 512KB of L2 cache per core, the same as in the Bobcats.

The Bobcat cores supported various levels 1 through 4A of the SSE media processing instructions that are compatible with Intel chips, and the Jaguars add in support for SSE4.1 and SSE4.2 instructions. The Jaguar chips will also have a 40-bit physical memory address space, up from 36 bits with the Bobcats, which means they will be able to, in theory address a lot more main memory.

That's 64GB at 36 bits, which is still a hell of a lot of memory for a laptop or tablet, up to 1TB at 40 bits. That big physical memory increase could mean that AMD is indeed planning server variants of Jaguar Fusion APUs, which would be very interesting if the on-chip Radeon GPUs could be made to do some offloaded mathematical calculations.

The floating point unit in the Jaguar is being boosted as well, according to Rupley. In the Bobcat cores, the FP unit had a two-wide decoder with two execution pipelines that could handle 64-bit processing. With Jaguar, the FP unit gets 128-bit processing and a 128-bit wide data path. The FP unit will be able to do four single-precision multiplies and four single-precision adds at the same time; it will also be able to issue one double-precision multiply and two double-precision adds per clock.

If you double pump the FP unit, you can do one 256-bit AVX vector math instruction per clock. This 128-bit FP and 256-bit AVX processing is as good as a Bulldozer, Piledriver, or Steamroller core in the Opteron server chips can do.

The L1 instruction and data caches on each core will stay the same at 32KB each with the Jaguar design, but prefetchers and load/store units have lots of tweaks to make them hum along more efficiently. The integer execution unit is essentially the same, with its schedulers able to issue two instructions, one load, and one store per clock.

The core enhancements added about 4 per cent more performance with Jaguar over Bobcat in terms of instructions per clock (IPC) cycle, and other tweaks add up to more than 15 per cent better IPC. That's not too shabby for a tweak to an existing architecture.

AMD Jaguar core floor plan

AMD Jaguar core floor plan

All of these components are spread out on the Jaguar chip in an "amoeba-like" floor plan that Rupley says "took a lot of blood, sweat, and tears" to come up with and that was created using tools developed by the ATI side of the house to build AMD's GPUs. "We had some initial floor plans that were really terrible," admits Rupley, as the CPU designers learned to use the GPU tools better.

The Bobcat core weighs in at 4.9 square millimeters in area using the 40 nanometer process at TSMC, and if Jaguar were implemented in the same process it would have about 10 per cent more area, according to Rupley. But lucky for AMD and its customers, Jaguar cores will be implemented in 28 nanometer processes and will only need 3.1 square millimeters of space. ®

Beginner's guide to SSL certificates

More from The Register

next story
Xperia Z3: Crikey, Sony – ANOTHER flagship phondleslab?
The Fourth Amendment... and it IS better
Don't wait for that big iPad, order a NEXUS 9 instead, industry little bird says
Google said to debut next big slab, Android L ahead of Apple event
Microsoft to enter the STRUGGLE of the HUMAN WRIST
It's not just a thumb war, it's total digit war
Ex-US Navy fighter pilot MIT prof: Drones beat humans - I should know
'Missy' Cummings on UAVs, smartcars and dying from boredom
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
A drone of one's own: Reg buyers' guide for UAV fanciers
Hardware: Check. Software: Huh? Licence: Licence...?
The Apple launch AS IT HAPPENED: Totally SERIOUS coverage, not for haters
Fandroids, Windows Phone fringe-oids – you wouldn't understand
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.