Intel Core i7 'Nehalem' processor and X58 chipset
We put the chip giant's new architecture to the test
Review In appearance, the new Intel Core i7 - based on the 'Nehalem' microarchitecture - looks like a bigger, chunkier version of the Core 2 Quad but under the heat spreader and casing it has a radical design that breaks new ground.
New ground, that is, for Intel, but Core i7 seems to have rather a lot in common with AMD's Phenom microprocessor. Both CPUs have four cores on a single die, unlike the pair of dual-core CPUs you’ll find inside a Core 2 Quad. Both have the memory controller integrated inside the processor.
In addition, Core i7 has ditched the frontside bus and moved to the QuickPath Interface (QPI), which bears a strong resemblance to the HyperTransport bus AMD uses. QPI is the new name for Intel's erstwhile HyperTransport rival, Common System Interface (CSI).
Intel has adopted a base clock speed of 133.33MHz that is used to drive the CPU speed, memory speed, QPI and the bizarrely named Uncore. Each part works in conjunction with a clock multiplier so, for instance, the 3.2GHz Core i7 965 Extreme runs at 24 x 133MHz, while its memory controller might run the 1066MHz DDR 3memory at 8 x 133MHz. Each processor core has its own multiplier, so the speed of the cores can be adjusted independently of each other, just like Phenom, which may sound intriguing to the overclockers among you but that’s only part of the story.
Intel’s Turbo Mode technology adjusts the speed of the cores in the new processor dynamically and can raise the speed of a core by up to three multiples of the base clock, ie. 400MHz. Turbo Mode assists both performance and power saving as there are times when you're better off with two fast cores rather than four slower ones. The processor speed and power draw can adapt to the workload while monitoring the temperature of the cores to avoid overheating. Turbo Mode is assisted by the introduction of transistors that Intel calls power gates and which are transistors that don’t suffer from leakage when they're turned off so a core that is shut down doesn’t waste power.
Inside the Core i7
The Power Control Unit accounts for one million transistors and holds its operations in firmware loaded from the motherboard Bios, so the way the CPU operates can be updated with relative ease.
If you overclock the base clock then all of the other clock speeds are affected but if you choose to work with the multipliers you can change the speed of one part of the processor without necessarily affecting another part.
The Core i7 965: overclock block removed
The core of the CPU is the computational units, branch prediction, the cache, and other bits and pieces such as the registers. Everything outside the core is classed as the Uncore, but two big chunks of Uncore are the DDR 3 memory controller and the QPI link between the CPU and the system logic chipset's northbridge. That leaves some rather important odds and ends such as the L3 cache, power management and - potentially - integrated graphics.
Core i7 gives every indication that it's a modular design which can be developed in a number of different directions. So server chips might have even more L3 cache and QPI links, while a highly integrated desktop chipset could cut the amount of L3, slim down the memory controller and add a graphics core.
The area of the Core i7 die is a sizeable 263mm² which is larger than the 214mm² of the four-core Core 2 Extreme QX9650. Yet the Core i7 has few transistors than the Core 2 Extreme: 731m to 820m. No doubt the amount of cache in the two generations of processor is responsible for one of those changes, as the QX9650 has 12MB of L2, while Core i7 has 256KB of L2 cache per core and 8MB of shared L3 cache for a total of 9MB.
However, that doesn’t explain why the Core i7 die has such a large area so we’re going to take a guess that Intel has left space on the die to allow it to make changes to the feature set without a radical overhaul of the silicon.
An alternative explanation is that the apparent missing area is due to a change that Intel has made in the type of transistors that it uses in Core i7. The L1 and L2 caches contain eight transistors per memory cell which means the cache can use a lower power configuration. However, the larger L3 cache uses a traditional six-transistor-per-cell design.
Key features of the three 'Bloomfield' LGA1366 Core i7 processors that are due to launch later in November are the 45nm fabrication process and the integrated DDR 3 memory controller. We'll see similar technology in AMD’s 'Shanghai' Opteron and 'Deneb' desktop processors, but one Core i7 feature we don’t expect to see replicated by AMD is the use of HyperThreading. Yes, the quad-core Core i7 has eight virtual cores.
Intel's Core i7 965 Extreme in CPU-Z
This is something of a puzzle, as you rarely come across software that makes proper use of four cores so eight virtual cores seems like complete overkill.
Core i7 supports SSE4.2 which will doubtless reap some rewards with optimised software. Intel officially supports 800MHz and 1066MHz DDR 3 memory, which seems rather slow as 1600MHz was a feature of X48-based motherboards and it was common to overclock memory to 1800MHz or even 2000MHz. Clearly, Intel is relying on the bandwidth offered by triple-channel memory to overcome the relative lack of speed, but it's also serious about saving power and warns that if you pump too much juice into your Ram, you might damage your CPU.
‘Too much’ is 1.5V, although Asus says that it is has successfully tested its X58-based P6T motherboard with 1.65V coursing through the memory. We have previously used as much as 1.95V in DDR 3 memory, so this is a dramatic change but it seems entirely consistent with Intel’s approach to the P45 chipset , which moves away from the need to pump large amounts of power through the chipset when you want to overclock your processor.
Intel's Core i7 920 in CPU-Z
On the subject of chipsets, the only pairing for the initial models of Core i7 is the Intel X58, which includes the ICH10R southbridge. Nvidia has a QPI bus licence and could theoretically manufacture chipsets for the Bloomfield variant of Core i7 but it either ran out of development time or considered that it was a poor use of development resources.
Either way, for now the only chipset in the game is the X58, and that posed Nvidia with a problem its graphics people are keen to sell as many gaming GPUs as possible. This dilemma has been resolved by allowing SLI to run on X58 provided the motherboard vendor has paid the necessary fee to Nvidia. We understand this fee stands at $5 per motherboard and at present the only motherboard manufacturer to refuse the payment is Intel itself. If you buy an Asus, ECS, EVGA, Gigabyte or MSI X58 motherboard you can be fairly sure that it will be able to run CrossFire and SLI, while the Intel DX58SO board can ‘only’ run CrossFire.
The implications of this move are huge as it means that you could build a gaming PC around X58 and Bloomfield without having to worry about the type of graphics card you plan to run in 2009 or 2010. It also means that the hateful nForce 200 PCI Express chip is unlikely to make an appearance on many motherboards as X58 supports PCI Express (PCIe) in 2 x 16 or 4 x 8 configuration. If you’re desperate for Tri-SLI with 3 x 16 PCIe 2.0, you will need the extra chip but the majority of gamers can expect their motherboard choice to become simpler.
Intel's X50 chipset schematic
They can also expect the motherboard to use less extensive – and less expensive – cooling as X58 is essentially X48 without the memory controller, although it has a QPI link to the CPU. The new chipset has less work to do than before and doesn’t require a heat spreader and as an added benefit your overclocking efforts are likely to leave the chipset voltage unchanged. Hurrah for common sense.
The first three members of the Core i7 family to go on sale are LGA1366 Bloomfield CPUs. The Core i7 920 runs at 2.66GHz and sells for $284 in batches of 1000, which currently means a UK price of £270. The Core i7 940 is clocked at 2.93GHz and sells for $562, which is a steep £493 retail over here.
The fastest Bloomfield is Core i7 965 Extreme, which has a clock speed of 3.2GHz and the usual Extreme price of $999. This would usually equate to £650, however we are seeing it on the web at £881. Holy Mother of !*?!!!
The 920 and 940 have a QPI bandwidth of 4.8 'gigatransfers' per second which is 9.6GB/s in each direction or 19.2GB/s overall. The 965 Extreme has a bandwidth of 6.4GT/s or 12.8GB/s in each direction for a total bandwidth of 25.6GB/s. The 965 Extreme is unlocked in the same way that all Extreme processors are unlocked, but the Core i7 is described as having its "Overspeed Protection removed".
Intel's DX58SO mobo: CrossFire, yes; SLI, no
Bloomfield won’t be the only variant of Nehalem as Intel has some LGA1160 versions up its corporate sleeve. The quad-core 'Lynnfield' and dual-core 'Havendale' will use a DMI bus with dual-channel DDR 3 memory. We may well see Nvidia chipsets for these processors although it 's hard to see what they would be able to bring to the party. In addition to the X58 chipset, we noted that the Intel INF driver also applies to chipset models 5520 and 5550, so it seems that there are more Nehalem chipsets in the works.
That’s the theory but now it’s time to see how Core i7 performs.
Intel said, 'test these.' So we did.
Our Intel-supplied review kit consists of a hefty box of goodies that includes a Core i7 920, Core i7 965 Extreme, DX58SO motherboard, three 1GB 1066MHz DDR 3 DIMMs from Qimonda, a regular Intel heatsink, a hefty Thermalright heatsink and an 80GB X25-M Solid State Drive.
Although we love the X25-M  dearly, it’s 80GB capacity makes it impractical for everyday use and its £500-600 cost makes it an exotic treat. For the purposes of this review we’re sticking with a 1TB WD Caviar Black , which is a decent hard drive and which neither slows nor flatters the Core i7.
Asus' Rampage II Extreme mobo: CrossFire and SLI
The other decision we took is to test the new CPUs with an Asus Rampage II Extreme motherboard that supports both CrossFire and SLI. We used a relatively puny GeForce 8800GT during our tests as the emphasis was on system and CPU performance, but we have also done some quick runs with a pair of GeForce GTX 280 cards in SLI and intend to do a follow-up on Core i7 and X58 for gaming.
Although the Intel DX58SO motherboard looks very interesting, it seems a bit daft to exclude the possibility of SLI while we’re getting familiar with this new processor and chipset.
Our starting point is the Core 2 Extreme QX9650 overclocked from its stock speed of 3.0GHz to 3.16GHz on an Intel DX38BT with 2GB of dual-channel DDR 3 running at 1333MHz. This is a damn fine PC that represents the best of the current generation of technology so it is telling to compare it with the Core i7 Extreme at its stock speed of 3.2GHz with triple-channel memory running at 1,066MHz.
Longer bars are better
SiSoft Sandra and PCMark05 show that the triple-channel memory has stacks more bandwidth than the dual-channel DDR 2 and also has lower latency, just as we would expect. PCMark05 awarded extra marks for the memory element of the test and was also favourable towards the Core i7 CPU, which was intriguing as the clock speed of the two processors was nearly identical.
SiSoft Sandra Results
Bandwidth in Gigabytes per second
Longer bars are better
SiSoft Sandra Results
Latency in nanoseconds
Shorter bars are better
The synthetic POV-Ray benchmark rated both processors evenly, but recoding a 350MB AVI file in DivX 6.8 was a massive triumph for Core i7. DivX seems to be an early adopter of new SSE instructions but the jump from Core 2 to Core i7 is quite remarkable.
AVI Conversion Results
Time in seconds
Shorter bars are better
The Asus Rampage II has CPU Level Up settings in the Bios that allow for easy overclocking with the delightful pre-set names i7-crazy-3.60G and i7-crazy-4.00G. The 3.6GHz setting raises the 133MHz base clock to 150MHz with the multiplier at 24x and memory at 8x, while the 4.0GHz setting is 26 x 155MHz and 6 x 155MHz memory. That’s only 910MHz, so we raised the memory speed to the next setting which is 1282MHz. We would have expected the memory speed to be 1240MHz (8 x 155MHz) and wonder whether this might be due to the effects of dynamic overclocking.
The effects on performance of the increased clock speed were as impressive as we would have hoped, but the power draw was worryingly high. At 3.2GHz, the Core i7 965 was slightly more juicy than the QX9650 but when we overclocked the Core i7, the power consumption climbed to the heavens and showed no sign of the clever power-saving technologies we had hoped to see.
POVRay Rendering Time Results
Time in seconds
Shorter bars are better
Once we had Core i7 running at 4GHz, we monkeyed around with the memory to see just what effect triple-channel was having. To get the memory in dual-channel mode, we pulled out one module and naturally enough saw that memory bandwidth plunged. However, the effect on performance was fairly minor.
Interestingly, we saw power consumption under load drop by 15W, which suggests that the memory controller in the processor core is working rather hard.
We installed 3GB of memory in single-channel mode and once again found that performance was barely affected, but no doubt we shall find that games and other large work loads will benefit from the bandwidth.
Power Draw Results
Power in Watts
Having got impressive results from eight virtual processor cores we thought it would be a good idea to do a back-to-back comparison with the Intel D5400XS Skulltrail platform, which has dual quad-core QX9770 processors. We overclocked the processors to 4.0GHz and found that performance fell somewhere between the Core 2 QX9650 and Core i7, although the POV-Ray result was nothing short of epic. Unfortunately, the power draw was also colossal.
Our final series of test runs used the 2.66GHz Core i7 920 which can be easily overclocked to 2.93GHz or 3.20GHz. This isn’t exactly the same as the 3.2GHz Core i7 965 as the QPI bus of the 920 has less bandwidth and the performance of the 920 was indeed lower than 965 although the gap was very narrow.
It’s hard to sum up our feelings about Core i7 in a few sentences but we’ll give it a try. Intel’s new processor seems to owe very little to Core 2 yet it behaves like Core 2 on steroids. At any given clock speed, you get more performance out of Core i7. On the downside, it also demands more power.
The value of triple-channel memory is unclear but the inclusion of the new controller looks like a good idea as the extra bandwidth will certainly be used in some applications. Moving the memory controller from the chipset to the CPU is undoubtedly the right idea, and if AMD did it first, so what? In fact, that might be the best way to describe Core i7: it’s just like AMD’s Phenom, only done properly.
Core i7 takes over where Core 2 tails off and it delivers an impressive level of performance while raising clock speeds by only a small amount. The move away from the antique frontside bus is welcome and the Turbo Mode looks promising, but the power saving features seem to need some development.