Feeds

AMD to double up cores with Jaguars

And maybe finally a Cat server variant

Top three mobile application threats

Hot Chips For those of us hoping that Advanced Micro Devices would get into the low-powered server racket with some earnestness, it looks like the forthcoming processors based on the "Jaguar" cores will fit the bill quite nicely.

The Jaguars are the kickers to the current "Bobcat" family of x86 processors used in ceepie-geepie hybrids bearing the Fusion APU brand, but in years gone by, AMD poo-pooed the idea that a Cat family chip would ever wear server operating spots.

With the Jaguars, this could change, particularly with the significant increase in core count, cache, and main memory capacity that is being put into the design.

Jeff Rupley, chief architect of the Jaguar processors at AMD, didn't give any hints about where the Jaguar chips would be used - or where they would not be - in his presentation at the Hot Chips 24 conference in Cupertino on Monday, or what the target clock frequencies would be for the Jaguars. Rupley was only there to talk about the Jaguar architecture and deferred on such questions.

But what is clear from the specs is that if a Jaguar chip is suitable for cloud-optimized clients and other low-power devices such as tablets, then there is no good reason that a bunch of them could not be ganged up and crammed inside of a dense chassis of microservers to run Windows or Linux workloads that have only modest performance requirements and where server density is much more important. It could turn out that the performance per watt and performance per dollar per watt of a server-ized Jaguar chip beats a low-voltage Opteron 3300 or 4300.

Microarchitecture of the Jaguar core

Microarchitecture of the Jaguar core (click to enlarge)

No matter what, the Jaguar-based processors will have the benefit of moving to new 28 nanometer processes. That's plural, and it means AMD has created the Jaguar design so it can be dual-sourced from either GlobalFoundries or Taiwan Semiconductor Manufacturing Corp, both of whom make desktop and laptop processors for AMD these days. (TSMC also makes AMD's graphics processors, as it does graphics chips from Nvidia.) With the shrink from 40 nanometers to 28 nanometers, AMD is doing a number of things with the Jaguar chips.

First, it is doubling up the core count while making some substantial changes to the cache memory structure relative to that used in the Bobcat-based chips. There are also a number of tweaks to the instruction set to boost the performance per clock cycle (the same thing is expected with the "Piledriver" cores for Opteron server chips later this year) and support for AVX vector math.

The Jaguar design has four cores running along the bottom of the chip, with an L2 cache interface riding on top of the cores that links out to the northbridge of the chipset and to four banks of L2 cache memory with a total of 2MB of cache. That's 512KB of L2 cache per core, the same as in the Bobcats.

The Bobcat cores supported various levels 1 through 4A of the SSE media processing instructions that are compatible with Intel chips, and the Jaguars add in support for SSE4.1 and SSE4.2 instructions. The Jaguar chips will also have a 40-bit physical memory address space, up from 36 bits with the Bobcats, which means they will be able to, in theory address a lot more main memory.

That's 64GB at 36 bits, which is still a hell of a lot of memory for a laptop or tablet, up to 1TB at 40 bits. That big physical memory increase could mean that AMD is indeed planning server variants of Jaguar Fusion APUs, which would be very interesting if the on-chip Radeon GPUs could be made to do some offloaded mathematical calculations.

The floating point unit in the Jaguar is being boosted as well, according to Rupley. In the Bobcat cores, the FP unit had a two-wide decoder with two execution pipelines that could handle 64-bit processing. With Jaguar, the FP unit gets 128-bit processing and a 128-bit wide data path. The FP unit will be able to do four single-precision multiplies and four single-precision adds at the same time; it will also be able to issue one double-precision multiply and two double-precision adds per clock.

If you double pump the FP unit, you can do one 256-bit AVX vector math instruction per clock. This 128-bit FP and 256-bit AVX processing is as good as a Bulldozer, Piledriver, or Steamroller core in the Opteron server chips can do.

The L1 instruction and data caches on each core will stay the same at 32KB each with the Jaguar design, but prefetchers and load/store units have lots of tweaks to make them hum along more efficiently. The integer execution unit is essentially the same, with its schedulers able to issue two instructions, one load, and one store per clock.

The core enhancements added about 4 per cent more performance with Jaguar over Bobcat in terms of instructions per clock (IPC) cycle, and other tweaks add up to more than 15 per cent better IPC. That's not too shabby for a tweak to an existing architecture.

AMD Jaguar core floor plan

AMD Jaguar core floor plan

All of these components are spread out on the Jaguar chip in an "amoeba-like" floor plan that Rupley says "took a lot of blood, sweat, and tears" to come up with and that was created using tools developed by the ATI side of the house to build AMD's GPUs. "We had some initial floor plans that were really terrible," admits Rupley, as the CPU designers learned to use the GPU tools better.

The Bobcat core weighs in at 4.9 square millimeters in area using the 40 nanometer process at TSMC, and if Jaguar were implemented in the same process it would have about 10 per cent more area, according to Rupley. But lucky for AMD and its customers, Jaguar cores will be implemented in 28 nanometer processes and will only need 3.1 square millimeters of space. ®

Combat fraud and increase customer satisfaction

More from The Register

next story
Feast your PUNY eyes on highest resolution phone display EVER
Too much pixel dust for your strained eyeballs to handle
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Leaked pics show EMBIGGENED iPhone 6 screen
Fat-fingered fanbois rejoice over Chinternet snaps
Report: Apple seeking to raise iPhone 6 price by a HUNDRED BUCKS
'Well, that 5c experiment didn't go so well – let's try the other direction'
US mobile firms cave on kill switch, agree to install anti-theft code
Slow and kludgy rollout will protect corporate profits
Rounded corners? Pah! Amazon's '3D phone has eye-tracking tech'
Now THAT'S what we call a proper new feature
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Sony battery recall as VAIO goes out with a bang, not a whimper
The perils of having Panasonic as a partner
NORKS' own smartmobe pegged as Chinese landfill Android
Fake kit in the hermit kingdom? That's just Kim Jong-un-believable!
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.