Feeds

Power7 v Power6 - it's all about the cache

Double the thread count

3 Big data security analytics techniques

shrink

With the Power7 chip, IBM is shrinking down the chip with to 45 nanometer copper/SOI processes and allowing it to crunch 1.2 billion transistors onto the die. The Power7 cores are not all that different from the Power6 and Power6+ cores. The Power7 core has 12 execution units: two fixed point units, two load store units, four double-precision floating point units, one vector unit, and one decimal floating point unit.

The cores support out-of-order execution and are binary compatible with the prior Power chips. Each Power7 core has 32 KB of L1 instruction cache and 32 KB of L1 data cache and 256 KB of L2 cache tightly coupled to it. The chip has 32 MB of L3 cache implemented in embedded DRAM (eDRAM, not static RAM, or SRAM), and this is carved up into eight segments with 4 MB chunks affiliated with one of the eight cores.

The eDRAM is slower than SRAM, but is a lot closer to the cores. (This is important, and I will explain why in a second). The Power7 chip has two dual-channel DDR3 memory controllers implemented on the chip that delivers 100 GB/sec of sustained bandwidth per chip.

IBM Power7 Chip

IBM's eight-core Power7 processor

There are a couple of big changes with the Power7 design, and all of them impact performance. First and foremost, the chip includes 32 MB of on-chip L3 cache memory implemented in embedded DRAM instead of the off-chip L3 cache that was used with all the prior dual-core Power chips. This, as it turns out, may be more important than boosting the threads and cores compared to the Power6 and Power6+ chips.

IBM has said that the technology that it uses to make that 32 MB of on-chip L3 eDRAM cache has allowed it to create that L3 cache in such a way that using static RAM would have boosted the transistor count to around 2 billion transistors. (Which is, by the way, about where the quad-core Tukwila will weigh in with its 30 MB of on-chip L3 cache). According to Scott Handy, vice president of worldwide strategy and marketing for Power Systems, the eDRAM cache can store one bit of data using only one transistor and one capacitor instead of the six transistors needed for storing one bit using static RAM.

The effect of this eDRAM on the Power7 design, and its performance, is two-fold. First, by adding the L3 cache onto the chip, the latency between the cores and the L3 memory has been reduced by a factor of six, according to Handy. (The exact memory latency feeds and speeds were not available at press time). This means the Power7 cores are waiting a lot less for data than the previous Power cores were.

Also, by having that L3 cache take up a lot less space than it might otherwise, IBM could boost the core count by a factor of four, to eight cores on a die, and could double the thread count per core, to four. If it were not for the eDRAM, the Power7 chip might have looked a lot like Tukwila, with its transistor budget being half burned up by cache.

The Power7 chips that are being announced inside of four different Power Systems servers today run at 3 GHz, 3.3 GHz, 3.5 GHz, 3.55 GHz, 3.8 GHz, and 4.1 GHz. (IBM is using Power7 chips with six or eight working cores in the four boxes announced today). The latter two clock speeds are only available in the Power Systems 780 midrange server, and the higher 4.1 GHz clock speed is only available in the so-called TurboCore mode, when the system microcode is told to shut down half the cores in the eight-core chip so the processor can speed up from the 3.8 GHz it is allowed to run at with eight cores turned on.

In TurboCore mode, the activated four cores get access to all of the 32 MB of eDRAM L3 cache and to both memory controllers, and on database workloads where clocks and cache matter, this can boost performance by around 20 per cent. Moreover, the chips are actually rated to push up to 4.5 GHz, so Power Systems shops can overclock them further if the thermal conditions inside the servers allow for this, further boosting performance. Without overclocking, the Power7 cores - not the chips, but the cores - in the Power Systems 780 have about twice the database performance of the Power 570 machines using the Power6 and Power6+ chips.

"The slowest speed bin Power7 core is faster than a 5 GHz Power6 core," brags Handy. It will be interesting to see that claim verified with some performance data.

Equally importantly for an IBM that is doing battle with Oracle and its Sparc T 64-threaded T2 and T2+ chips and the quad-core, eight-threaded Tukwilas due from Intel today, the Power7 chip has 32 threads, eight times as many as the Power5 through Power6+ chips could bring to bear on workloads that like threads. One of those workloads is IBM's own WebSphere Application Server, and on early benchmark tests, shifting from a Power6 to Power7 system with the same number of cores boosted the performance of WebSphere running on AIX by 85 per cent.

By the way, each Power7 chip has a feature called Intelligent Threads, which allows those virtual instruction streams to be turned on and off as conditions dictate. The Power7 processors and their systems also have something called Active Memory Expansion, a memory compression technology built into the chip for its main memory that IBM has not discussed before and has not provided much detail about as yet. It looks AME offers 2:1 data compression on the DDR3 main memory from the brief mention it got in the official announcement today. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.