Big chip for big boxes: IBM cracks open lid on Power7
IBM has divulged some specs of its forthcoming Power7 chips and their related Power Systems servers, throwing down the gauntlet to its peers.
The company confirmed last fall that the Power7 chip would span up to eight cores and would use a 45 nanometer manufacturing process developed at IBM's East Fishkill, New York foundry. It then lifted the veil on Power7 a little higher in July when it said that the chip would come in variants with four, six, or eight cores activated.
And, more importantly for customers who are nervous about investing in Power6 and Power6+ systems today, when Power7 machines are just around the corner in early 2010, IBM also added that it would offer field upgrades from existing Power 570 and Power 595 boxes.
Ron Kalla, the chief engineer of the Power7 chip - who also had that position on the dual-core Power5 chip that was detailed six years ago - and Balaram Sinharoy, the Power7 core chief architect, gave a presentation at the Hot Chips conference in Stanford University, California, Tuesday that spilled a lot more details about the future IBM chip.
As El Reg told you it might last week, the Power7 chip does indeed include embedded DRAM, which is being used as a fat L3 cache for the cores on the die. The eDRAM, at 32MB, is twice as large as many had expected, and comprises a large portion of the 1.2 billion transistors on the 567 square-millimeter chip.
IBM is using a 45 namometer copper/SOI process to make the chip, and Kalla says that the functions on the chip and the clever way that IBM has implemented them make its 1.2 billion transistors equivalent to 2.7 billion transistors.
Each Power7 core has 12 execution units: two fixed point units, two load store units, four double-precision floating point units, one vector unit (for doing matrix math), and one decimal floating point unit (for doing money math).
The cores support out-of-order execution and are - obviously - binary compatible with the prior Power6 and Power6+ chips. The pipeline for the Power7 cores has been reworked again, just as it was for the Power5 and Power6 generations. (And in the case of the Power6, with less than expected results.)
The Power7 core has 32KB of L1 instruction cache and 32KB of L1 data cache. Each core sports simultaneous multithreading that delivers four virtual threads per core, and has a 256KB of L2 cache tightly coupled to it.
Each 4MB segment of the L3 cache implemented in eDRAM is affiliated with one of the eight cores, which might be a bit slower than static RAM but which is a lot closer than off-chip DRAM that sits, from the point of view of an electron, a zillion miles away out on the DDR3 DIMMs.
This L3 is obviously not large enough to act as a main memory for such a large chip, and the Power7 chip has two dual-channel DDR3 memory controllers implemented on the chip that delivers 100GB/sec of sustained bandwidth per chip.
IBM will offer Power7 systems that use an updated SMP clustering technology that's more efficient and delivers something closer to linear scalability than prior Power5 and Power6 machines did at the high end of the IBM AIX and OS/400 server lineup.
IBM plans to scale up Power7-based Power Systems as high as 32 sockets, which is where its Power5 and Power6 generations of Power 595 boxes scaled. The difference is that the Power7 boxes will have four times the cores and eight times the threads.
Specifically, a top-end Power7 server will have 32 sockets and 360GB/sec of SMP bandwidth per chip linking them together into a shared-memory system. That high-end machine - which would logically be called the Power 795, but IBM's naming conventions don't obey logic - would have 256 cores and 1,024 threads in total.
IBM is not talking about clock speeds in particular, but Kalla explained that the top-end Power7 chips will offer more than four times the performance of a Power6 chip, and do so in the same thermal envelope. With all those extra threads and other tweaks, that probably means clock speeds ranging from 3GHz to 4GHz.
To save on energy, IBM has added features in the Power7 chip that allow for cores to be turned off and on dynamically, and to set core frequencies on individual cores independently. The design also allows for threads to be disabled on each core until they are all disabled - some workloads don't care about threads and want more clock speed on a single thread. And there's a turbo mode that allows for a core's clock speed to be ramped up by 10 per cent.
Because IBM's Power chips are used in a variety of different platforms, and because getting good yields on such a large chip is difficult at first, IBM will offer different packaging for the Power7 parts.
For two-socket and four-socket blade and rack servers, Kalla says that IBM will create a single chip organic package that has one memory controller - my guess is that this package will have four cores and some of the eDRAM L3 cache deactivated because it has boogers on it.
Midrange and high-end Power7 servers will use a single-chip glass ceramic package that has both memory controllers activated and could sport four and probably six cores (IBM was not precise on the core count in each package) as well as SMP links for systems up to 32 sockets.
In a twist, future Power7 supercomputers will be getting their own quad-chip multichip module (MCM) package, which puts four Power7 chips and eight activated memory controllers on a single package. This is no doubt the configuration going into the Blue Waters massively parallel supercomputer that IBM is building for the National Center for Supercomputing Applications at the University of Illinois. It looks like two of these 32-core MCMs will be put into each 2U rack server in the Blue Waters system.
The Power7 chip is running in the IBM labs right now supporting its AIX and i operating systems as well as Linux.
IBM is one of the few companies left that makes its own chips and servers that wrap around them, and unless something radical happens, Big Blue will continue to do so into the foreseeable future. ®