The Register® — Biting the hand that feeds IT

Feeds

AMD cuts to the core with 'Bulldozer' Opterons

The future is modular

IT shops buy current products, but they always have their eyes out one or two generations to assure themselves they aren't buying into a dead-end product. Which is why makers of chips and other components that go into systems as well as system makers themselves are forced to talk about the future when what they really want to do is focus on this quarter, right now. And so it is with the future "Bulldozer" cores expected in 2011 from Advanced Micro Devices.

The pressure to compete now and in the future is high, and the competition between AMD and Intel is intense. The etching on the six-core "Istanbul" Opteron 2400 and 8400 processors, launched in June, is barely dry, and they have barely ramped to volume among the server makers. But in September, AMD talked up its future homegrown chipsets, and in November, it trumpeted the next-generation of Opteron processors, the "Magny-Cours" Opteron 6100s for two-socket and four-socket servers and the "Lisbon" Opteron 4100s for uniprocessor and two-socket boxes.

With the Rev F iterations of the Opteron chips - which are based on the original "K8" core design and which put two, four, and then six cores on a single die - AMD basically took a cookie cutter approach to adding cores to the die, plunking multiple and identical cores, complete with all the circuits they would need if they were the only processor in a system. With the Bulldozer cores (which are not called the K9 generation, by the way, perhaps because AMD does not want any chip to be affiliated with a dog), AMD is being a little more clever.

Instead of having a core as the basic building block, the Bulldozer core is implemented as what AMD is calling a module. Take a look at this pretty picture:

AMD Bulldozer Module

The Opteron Bulldozer multicore module

In the diagram above, the core is not really a core in the traditional sense that we have been using that word, since some elements of what we have been thinking of as a core are shared across multiple integer and floating point units in the Bulldozer design while others are doubled up as you might expect from past Opteron designs.

"By sharing some components, we can reduce both power consumption and costs, but also scale performance," says John Fruehe, director of server product marketing at AMD, who walked El Reg through the Bulldozer design.

The "core" in the Bulldozer design is a single-threaded, four-pipeline integer unit, which as you can see will have its own scheduler and its own L1 cache. This is essentially the same structure as the K8 Opteron integer unit, according to Fruehe, who says that 90 percent of the workload an Opteron has to cope with runs through the integer unit. Rather than giving each core its own fetch and decode unit, the Bulldozer puts a slightly wider fetch and decode unit on the module, which allows them to share it.

As you can see in the diagram, the Bulldozer module has a shared floating point scheduler and two 128-bit floating point units, which debuted with the quad-core "Barcelona" Opteron 2200s and 8200s two years ago. (These FP units can do two 64-bit double-precision operations per clock or four 32-bit single precision operations). What is neat about the Bulldozer design is that either "core" in the module can grab the scheduler and if the other core is not doing floating point, then it can take all 256 bits and do four double precision or eight single precision ops in a clock.

Next page: Performance times 1.8

The title is required, and must contain letters and/or digits.

My Quad core PC still hangs for a few seconds while Windows decides it needs to refresh something or other.

I'm sure another 12 cores will make no difference.

Ditto startup - that always seems to wait, rather than really running stuff in parallel.

1
0
Anonymous Coward

Actually, the opposite of Cell

Cell has one general purpose core, and many special purpose coprocessor elements. Bulldozer has many general purpose cores, and two FP coprocessor elements. If AMD put GPUs in the mix with a dual-core Bulldozer module, that would be more like Cell.

The Bulldozer core structure is a lot like UltraSPARC T2, which has two integer pipelines and one FPU per core, and the aborted Rock processor, which had four integer pipelines and one FPU per core.

Actually, by using existing Opteron integer pipelines, and putting four of them into a core and sharing the L1 cache, Bulldozer is philosophically very similar to the UltraSPARC T2, which basically replicated two existing UltraSPARC T1 pipelines in the core with a shared L1 cache.

0
0

@John Savard

Not an expert but it's too easy an assumption that one can increase the number of instruction stages. Pipelines can already be quite long and each stage would have to take about the same time to execute or it would stall but for example the execution can be very variable, from a simple shift to an integer sqrt (some PPCs had this I think), and a mem fetch can be getting on two orders of magnitude difference depending whether it hits L0 cache or misses everything and reads ram. Also longer pipelines = longer bubbles. I don't think AMD techs are would miss something so obvious.

0
0

Long Pipeline

They ought to have multiple threads per core; not two, like Intel's HyperThreading, but eight or more. Because if you don't have a long pipeline, that means you're not cutting the instructions into small enough pieces.

Without having to devise faster transistors, just keeping more of them busy, one could achieve a clock frequency that is four times higher by cutting the instructions into four times as many parts. Unless, of course, they took that as far as it could go back in the days of the Pentium 4.

0
0

Hmmm..

Clouded Leopard... How... Prophetic.

0
0

More from The Register

MYSTERY Nokia Lumia with gazillion-pixel camera 'spotted'
With 20Mp sensor - NOW will you try Windows Phone 8?
 breaking news
The iWatch is coming! The iWatch is coming!
Reports: Apple's wrister to have 1.5-inch OLED, test units being built
US boffin builds 32-way Raspberry Pi cluster
Beowulf cluster built for the price of a single PC
Dell's PC-on-a-stick landing in July: report
Wyse up, suckers, could this be a new set-side-stick?
Review: HP Pavilion 14 Chromebook
All roads lead to Chrome?
Borked your iDevice? Pay EVEN MORE to have it fixed by Applecare
Or scream at their hapless techies on their forums
Review: Sony Xperia SP
The new mid-range marvel? Oh yes.
Euro PC shipments plummet into bottomless pit of DOOOOM
11th quarter of decline, 20pc drop on last year - Gartner
Microsoft reveals Xbox One, the console that can read your heartbeat
Upgrades Live service – and no always-on requirement