Big chip for big boxes: IBM cracks open lid on Power7
IBM has divulged some specs of its forthcoming Power7 chips and their related Power Systems servers, throwing down the gauntlet to its peers.
The company confirmed last fall that the Power7 chip would span up to eight cores and would use a 45 nanometer manufacturing process developed at IBM's East Fishkill, New York foundry. It then lifted the veil on Power7 a little higher in July when it said that the chip would come in variants with four, six, or eight cores activated.
And, more importantly for customers who are nervous about investing in Power6 and Power6+ systems today, when Power7 machines are just around the corner in early 2010, IBM also added that it would offer field upgrades from existing Power 570 and Power 595 boxes.
Ron Kalla, the chief engineer of the Power7 chip - who also had that position on the dual-core Power5 chip that was detailed six years ago - and Balaram Sinharoy, the Power7 core chief architect, gave a presentation at the Hot Chips conference in Stanford University, California, Tuesday that spilled a lot more details about the future IBM chip.
As El Reg told you it might last week, the Power7 chip does indeed include embedded DRAM, which is being used as a fat L3 cache for the cores on the die. The eDRAM, at 32MB, is twice as large as many had expected, and comprises a large portion of the 1.2 billion transistors on the 567 square-millimeter chip.
IBM is using a 45 namometer copper/SOI process to make the chip, and Kalla says that the functions on the chip and the clever way that IBM has implemented them make its 1.2 billion transistors equivalent to 2.7 billion transistors.
Each Power7 core has 12 execution units: two fixed point units, two load store units, four double-precision floating point units, one vector unit (for doing matrix math), and one decimal floating point unit (for doing money math).
The cores support out-of-order execution and are - obviously - binary compatible with the prior Power6 and Power6+ chips. The pipeline for the Power7 cores has been reworked again, just as it was for the Power5 and Power6 generations. (And in the case of the Power6, with less than expected results.)
The Power7 core has 32KB of L1 instruction cache and 32KB of L1 data cache. Each core sports simultaneous multithreading that delivers four virtual threads per core, and has a 256KB of L2 cache tightly coupled to it.
Each 4MB segment of the L3 cache implemented in eDRAM is affiliated with one of the eight cores, which might be a bit slower than static RAM but which is a lot closer than off-chip DRAM that sits, from the point of view of an electron, a zillion miles away out on the DDR3 DIMMs.
This L3 is obviously not large enough to act as a main memory for such a large chip, and the Power7 chip has two dual-channel DDR3 memory controllers implemented on the chip that delivers 100GB/sec of sustained bandwidth per chip.
Sponsored: Benefits from the lessons learned in HPC