The Register® — Biting the hand that feeds IT

Feeds

Intel 'Nehalem' CPU 'borrows' AMD Phenom cache plan

L3 promoted, L2 demoted

Intel's 45nm 'Nehalem' processor architecture, due for release later this year, will see the chip maker adopt AMD's approach to cache structure: small per-core Level 1 and Level 2 caches connected to a big, shared Level 3 cache.

Nehalem, which will form the basis for two-, four- and eight-core processors, will contain 64KB of L1 cache per core, split 50:50 between memory reserved for program instructions and for data. That's current how Core 2 CPUs work, but while today's desktop and mobile CPUs complement that with a big, multi-megabyte L2 caches shared between pairs of cores, each Nehalem core will get 256KB of L2 cache of its own.

All two, four or eight cores will then be able to access a shared pool of up to 8MB of L3 cache memory, allowing them to take as much or as little as they need for the threads they're running up to the overall limit.

Intel Nehalem

Intel's Nehalem: native quad-core

It's an approach AMD introduced with its Phenom chips. Earlier AMD processors gave each CPU both its own L1 cache and L2 memory. Intel previously poo-poo'd this design, claiming better performance could be achieved using a shared L2. Whatever the reason, the Phenom CPU line introduced a third tier of cache, this time shared.

The Phenom 9600, for example, has 2MB of L2 divided into four 512KB blocks, each assigned to a single core. All four cores share a further 2MB of L3. Each core has 128KB of L1 cache.

It's a logical move for Intel as it was for AMD. The exclusive L2 caches give each core a pool of fast-access memory, while the shared cache acts as a buffer to trap data and instructions other cores may have requested and which another core can now grab more quickly that going out to main memory or peeking onto other cores' personal storage.

More to the point, since Nehalem is essentially Intel's first design - as AMD's have been for some time - that doesn't build four-core CPUs out of groups of two two-core dies. With no shared L3, the core-pairs in today's Core 2 Quad and Core 2 Extreme processors have to look in other core-pairs' caches, which can hinder performance.

Each Nehalem core uses Intel HyperThreading technology to handle up to two processing threads in execution simultaneously, allowing a four-core chip to appear to the host OS as an eight-core part.

Nehalem will initially be a 'true' quad-core part, but Intel promised future, eight-core parts that are built natively rather than from a part of quad-core CPUs bolted together.

The CPU design incorporates an out-of-order window running to 128 instructions, up from Core 2's 96 instructions. That allows the new chip to look ahead to a greater number of instructions to see which can be pulled out of the program sequence and processed without affecting the results of operations further down the line. It's also able to keep 33 per cent more micro-ops in flight at once than its predecessor could.

Latest Comments

both sides getting the good ideas from each other.

to make things short:

1. AMD Phenom (and Barcelona) copied Intel Core 2 Duo's true 128-bit internal datapath.

2. AMD Phenom (and Barcelona) copied Intel Core 2 Duo's Fetch Cycle - 32 bytes (256 bits) of data per clock cycle.

3. L3 Shared Cache is old Intel server technology adopted by Phenom and Barcelona.

--

both sides getting the good ideas from each other.

--

now with Intel Nehalem:

1. adopted integrated memory controller concept.

2. made a hypertransport-like bus and calls it Quick-Path Interconnect.

3. native/true quad/octo-core.

0
0

borrowed... really?

L3 shared cache is old intel server technology adopted by the K10m architecture (Phenom and Barcelona). so did intel really borrowed this from AMD?

borrowed/copied... here are some facts;

1. The use of a true 128-bit internal datapath. On previous CPUs based on K8 microarchitecture the internal datapath was of 64 bits only. This was a problem for SSE instructions, since SSE registers, called XMM, are 128-bit long. So, when executing an instruction that manipulated a 128-bit data, this operation had to be broke down into two 64-bit operations. The new 128-bit data path makes K10 microarchitecture faster to process SSE instructions that manipulate 128-bit data compared to K8 microarchitecture.

Intel processors based on Core microarchitecture (Core 2 Duo, for example) also have 128-bit internal datapaths , while Intel processors based on Netburst microarchitecture (Pentium 4 and Pentium D) have a 64-bit internal datapaths.

AMD is calling this new feature “AMD Wide Floating Point Accelerator”.

2. The fetch unit fetches 32 bytes (256 bits) of data per clock cycle from the L1 instruction cache – this is the double CPUs based on K8 architecture could fetch per clock cycle. Intel CPUs based on Core microarchitecture, like Core 2 Duo, also fetches 32 bytes per clock cycle.

3. K10 architecture adds a shared L3 memory cache (OLD INTEL SERVER CHIP TECHNOLOGY) inside the CPU... The size of this cache will depend on the CPU model, just like what happens with the size of L2 cache.

AMD calls this approach as “Balanced Smart Cache”.

http://ocxt.multiply.com/journal/item/47/Inside_AMD_K10_Architecture.

0
0

Re: Borrowed

"To accuse Intel of borrowing AMD's plan is a bit like accusing bats of pinching the idea of wings from the birds, oblivious to the fact that insects got there first."

Thankfully the insects didn't have the foresight to patent their wings ;)

0
0

Doesnt that figure...

Not only does Intel play the role of abusive monopolist (which is about the only thing its truly good at), they're stealing their idea's too...

I would love to Intel get broken up for violating its market position...

0
0

Cheater

If Intel claimed that they make the best processor, why do they have to borrow AMD cache plan for? Why don't they create their own architecture than using other architecture?And let see how well will they stand against the 45 Quad Shanghai, Deneb, Montreal, Suzuki, Bulldozer, and Sandtiger.

0
0

More from The Register

US boffin builds 32-way Raspberry Pi cluster
Beowulf cluster built for the price of a single PC
Nintendo throws flaming legal barrel at YouTubing fans
All your walk-through vid revenue are belong to us
Review: HP Pavilion 14 Chromebook
All roads lead to Chrome?
Borked your iDevice? Pay EVEN MORE to have it fixed by Applecare
Or scream at their hapless techies on their forums
Euro PC shipments plummet into bottomless pit of DOOOOM
11th quarter of decline, 20pc drop on last year - Gartner
 breaking news
Report: AT&T dropping Facebook phone after dismal sales
Turns out folks won't buy that for a dollar
Which petite model likes a fondle and GETTING WET? Sony's Xperia ZR
Take this new mobe swimming. Just not deep, or for long, OK?
Google adds Atari Easter Egg for Breakout's birthday
Cute game born in Jobsian heart of darkness