All the sauce on Big Blue's hot chip: More on Power7+
Clock crank, cache bump ... and maybe on-chip memory compression too
The Hot Chips 24 conference hosted by Stanford University is next week, and IBM, Oracle, Advanced Micro Devices, Fujitsu, and Intel are expected to talk tech relating to just-announced or impending processors. But Big Blue seems unable to contain its enthusiasm for the Power7+ chip that it will talk about alongside its next-generation zNext processors for its System z mainframes.
We have already told you about some of the details on the forthcoming chip  that could be scrounged from poking around the Intertubes. From a die shot of the Power4 through Power7+ families of processors that IBM has shown to customers and partners, we were able to discover that the Power7+ chip had eight cores, just like the Power7 chip the precedes it.
It wasn't clear how much on-chip, shared embedded DRAM L3 cache memory was on that chip from looking at the die, but it was clearly more. Now. thanks to a performance document published on IBM's developerWorks site (PDF) , we know that IBM is boosting the L3 cache size from 4MB for each local core segment on the Power7 chip (for a total of 32MB) to 10MB per core on the Power7+ chip (for a total of 80MB). This is a tremendous amount of cache memory and is four times what Intel has put on its latest "Sandy Bridge" Xeon E5 server processors.
All that extra cache memory, which should have a dramatic effect on performance, is enabled because of the shrink from the 45 nanometer processes used to etch the Power7 chips to the 32 nanometer processes used for the Power7+ chips. But there are some other changes to the chip in addition to making the cores smaller (the cores are basically the same) and wrapping more cache around them. IBM's roadmaps have been talking about accelerators, and if you poke around patches to the Linux kernel, you can see what some of them are. As previously reported:
- In this post  at the Linux kernel drive database, we see that the Power7+ will have an in-nest cryptographic accelerator that supports the Advanced Encryption Standard (AES) encryption algorithm as well as the Secure Hash Algorithm-2 (SHA-2) functions developed by the National Security Agency in the United States. (Hash functions are used all over the place in code and microcode alike.)
- This link at the Linux-Crypto site  talks about driver support for an on-chip AES accelerator. (Intel's Xeon 5600, E5, and E7 processors support AES encryption and decryption, and Oracle's Sparc T4 supports both AES encryption and SHA-1 and SHA-2 hashing functions.)
- This link  suggests there will be a random number generator etched onto the Power7+ processor. RNGs are also an important part of many applications, particularly in financial services or physics simulations that require randomness.
IBM's chipheads were talking  to the Wall Street Journal about the upcoming Hot Chips conference, and Satya Sharma let slip that the clock speeds on Power7+ chips would be 10 to 20 percent higher than those on the Power7. Sharma is an IBM Fellow and CTO of the Power Systems line who leads the development of the Power7 and Power7+ processors.
Power 7 clock speeds range from a low of 3GHz – on a four-core chip used in the Power 720 entry server – to a high of 3.92GHz in the Power 780 with all eight cores turned on, and a high of 4.14GHz in that chip running in turbo boost mode with half the cores turned off. You'd also get 4GHz in an eight-core chip used in the Power 795 and 4.25GHz in a four-core variant also used in that big box. That puts the possible range of clock speeds for Power7+ chips between 3.3GHz and 5.1GHz, but there could be some wiggle room there as IBM might get more clocks on the smaller chips and less on the larger ones. (Traditionally, IBM revs the processors on its biggest boxes faster to boost single-thread performance, so this would be a departure.)
Die shot of the IBM Power7+ processor for Power Systems iron
I was guessing that IBM would boost the clock speed on the Power7+ chips by between 25 and 30 percent, with the top bin parts spinning at above 5GHz and in the same range as the current z11 engines used in the System zEnterprise 114 and 196 machines , a quad-core chip that spins at 5.2GHz. (IBM will also apparently be boosting the clock speed on the zNext processor to 5.5GHz, up from the 5.2GHz used on the top-end z11 processor used in the current System z line.) We'll find out about the clock speeds in a week from the presentations at Hot Chips.
El Reg asked Big Blue for clarification on the statements made about the Power7+ chip in the WSJ, and this is what came back from Big Blue:
"Power7+ leverages 32 nanometer technology to provide increased frequency, 2.5X L3 cache, security enhancements, and memory compression with no increased power over previous generation Power7 chips."
The interesting bit in that statement is a reference to "memory compression." The AIX 6.1 operating system from 2010 was given a feature called Active Memory Expansion, a data compression algorithm implemented in software and tied to the Power7 processors that could do 2:1 squeezing on main memory. This data compression does two things: it allows more stuff to live in main memory, and it also allows for CPU utilization to be driven up in the system, pushing more work through it.
On one benchmark test (PDF)  running SAP  ERP applications on a 12-core Power 7 server with 18GB of physical memory, the memory was maxxed out but the CPU was only at 46 per cent and the machine only handled 1,000 SAP users and delivered 99 transactions per second of performance. With Active Memory Expansion turned on running AIX 6.1 on this system, the box was able to boost main memory by 37 percent to 24.7 GB. The SAP workload could then push CPU utilization up to 88 per cent (some from the memory compression), but now the machine supported 1,700 users and did 166 transactions per second. That's 70 per cent more users doing 65 per cent more work.
Active Memory Expansion imposed overhead on the Power7 CPU, but it is possible that IBM has etched the algorithms for crunching memory into the Power7+ chip, thereby eliminating the overhead on the cores in the processor. Also, if this memory compression is etched onto the chip, then it presumably could be used by Linux and IBM i operating systems, which do not currently support it. It will also presumably be a free feature instead of a charged feature, as it was with the AIX-Power7 combo.
"There should be nothing surprising here, as IBM has always followed a model of mapping processor architectures in the next generation of silicon to improve the value to the customer," explained Ron Kalla, chief engineer at IBM for both the Power7 and Power7+ processors, in an email exchange. "If you go back all the way to the RS64 processors, we mapped those into multiple technologies, adding a few new features along the way. This time, between Power7 and Power7+, we used the technology slightly differently. We decided to hold the power envelope and die area constant so we can easily plug upgrade existing systems while providing increased frequency."
So the Power7+ chips will slide into the current Power7 sockets, which is a good thing for customers and IBM alike.
"We also invested the additional transistors provided by 32nm technology in a few ways," continued Kalla. "We added eDRAM cache, which provides a high performance return on area and added on chip accelerators to offload work from the processor cores so more workload can be done by the existing cores – this has the same effect as adding cores. We also made security enhancements to provide higher levels of protection for our customers' data."
IBM doesn't publish thermal ratings for its various Power processors, which come with four, six, or eight cores with varying clock speeds. (There may also be differences in L3 cache, but IBM has never said so.) We will try to get some sense of where they are in terms of power consumption and heat dissipation at Hot Chips next week. ®