Big chip for big boxes: IBM cracks open lid on Power7
IBM has divulged some specs of its forthcoming Power7 chips and their related Power Systems servers, throwing down the gauntlet to its peers.
The company confirmed last fall that the Power7 chip would span up to eight cores and would use a 45 nanometer manufacturing process developed at IBM's East Fishkill, New York foundry. It then lifted the veil on Power7 a little higher in July when it said that the chip would come in variants with four, six, or eight cores activated.
And, more importantly for customers who are nervous about investing in Power6 and Power6+ systems today, when Power7 machines are just around the corner in early 2010, IBM also added that it would offer field upgrades from existing Power 570 and Power 595 boxes.
Ron Kalla, the chief engineer of the Power7 chip - who also had that position on the dual-core Power5 chip that was detailed six years ago - and Balaram Sinharoy, the Power7 core chief architect, gave a presentation at the Hot Chips conference in Stanford University, California, Tuesday that spilled a lot more details about the future IBM chip.
As El Reg told you it might last week, the Power7 chip does indeed include embedded DRAM, which is being used as a fat L3 cache for the cores on the die. The eDRAM, at 32MB, is twice as large as many had expected, and comprises a large portion of the 1.2 billion transistors on the 567 square-millimeter chip.
IBM is using a 45 namometer copper/SOI process to make the chip, and Kalla says that the functions on the chip and the clever way that IBM has implemented them make its 1.2 billion transistors equivalent to 2.7 billion transistors.
Each Power7 core has 12 execution units: two fixed point units, two load store units, four double-precision floating point units, one vector unit (for doing matrix math), and one decimal floating point unit (for doing money math).
The cores support out-of-order execution and are - obviously - binary compatible with the prior Power6 and Power6+ chips. The pipeline for the Power7 cores has been reworked again, just as it was for the Power5 and Power6 generations. (And in the case of the Power6, with less than expected results.)
The Power7 core has 32KB of L1 instruction cache and 32KB of L1 data cache. Each core sports simultaneous multithreading that delivers four virtual threads per core, and has a 256KB of L2 cache tightly coupled to it.
Each 4MB segment of the L3 cache implemented in eDRAM is affiliated with one of the eight cores, which might be a bit slower than static RAM but which is a lot closer than off-chip DRAM that sits, from the point of view of an electron, a zillion miles away out on the DDR3 DIMMs.
This L3 is obviously not large enough to act as a main memory for such a large chip, and the Power7 chip has two dual-channel DDR3 memory controllers implemented on the chip that delivers 100GB/sec of sustained bandwidth per chip.
AC, Pony Tail
do you realize what you just wrote?
"....Don't compare a low-end processor with a high-end one...." Why not? The high-end Niagara is several times as fast as the lowend Power6, while Niagara uses 1.6GHz and the Power6+ uses 4.7GHz! If that is not low-end performance from the Power6+, then I dont know. Admit it, the Power6+ is slow, uses lots of energy and is really expensive. This IS true, and nothing you say can change it. The Niagara is several times faster in certain server-client benchmarks - noone can deny that. The Niagara uses 1.6GHz, and the Power6+ uses 5Ghz, noone can deny that. It is more expensive, no one can deny that. Admit it.
"....If T2 or T3 goes beyond 4 sockets than you can compare them with POWER...." Whoa. Let me tell you, your logic is FAIL. Can you explain the point of T2 must go beyond 4 sockets? No? Then dont talk about this. (The point of using lots of sockets, is to get more performance. But Niagara packs plenty of performance already, 4 of the Niagaras can match 16 Power6+. If one CPU is as fast as 32+ Power6 then I dont see why that CPU is worthless because it only uses one sockel. 4 of the Niagara provides more performance than 16 Power6+, hence you dont need more than 4 of those. FAIL LOGIC)
"...It is fair easy to create an architecture with limited scalability eg. Nehalem, the real challenge is to go scale beyond...." True. But if that architecture already kills all performance with one socket, then I dont deem it worthless.
"...By the way, if you love so much benchmarks try to compare current T2 with Nehalem EP which scales about the same...." Why dont you want me to compare T2 against the slow Power6 instead? Is it because you know that the Power6 will loose big time? (Which it does)
"....If the T2+ is that good then why does Sun still try to sell us M-class boxes with Fujitsu chips for any DB or I/O workload?..." The T2+ is that good. On certain work loads. It is not suited for all work loads. This is no secret and everyone knows the T2 sucks on some work loads.
"...I would ask if those T3 threads are simultaneous or if they are round robin KISS ass treads...." Take a wild guess. Is it possible to achieve extreme through put only with round robin?
"...As I recall Pony tail boy claimed the T2 was 9.6GHz because he tried to multiply 1.2 * 8 cores...." Now, if that were true, then the T2 would be several times as fast as the highest clocked CPU, right? But... how can the T2 be several times as fast as Power6+ - which it is? Hmmm... Schwartz must be correct in some regard, or how can you explain the extreme through put of the T2? If Schwartz lied, then the T2 would be really dog slow.
Now tell me, is the T2 dog slow, or is it several times faster than the highest clocked CPU? Is Scwartz correct, or is he lying? (Hint: study the architecture of the Niagara and you will see why Schwartz said so. Then you will understand why the T2 doesnt need a large cache. And why it is revolutionizing the old desktop CPU model with high clock speed and large cache).
Kebabbert is a typical Chihuahua owner
If the T2+ is that good then why does Sun still try to sell us M-class boxes with Fujitsu chips for any DB or I/O workload?
I would ask if those T3 threads are simultaneous or if they are round robin KISS ass treads.
SPARC and Itanium are dead. Squeezed out between Nehalem and Power.
As I recall Pony tail boy claimed the T2 was 9.6GHz because he tried to multiply 1.2 * 8 cores.
Pot/Kettle/Black....join us or die ugly
Cheers from the UK
Man, do you realized what you just wrote?
Don't compare a low-end processor with a high-end one.
If T2 or T3 goes beyond 4 sockets than you can compare them with POWER.
It is fair easy to create an architecture with limited scalability eg. Nehalem, the real challenge is to go scale beyond.
If you want a fair comparison apple to apples, compare you so called King of the Hills T3 with Nehalem EX when they GA next year.
By the way, if you love so much benchmarks try to compare current T2 with Nehalem EP which scales about the same.