Power7+ chips debut in fat IBM midrange systems
Near the top at first, trickling down to smaller boxes next year
IBM has taken the wraps off the first of its Power Systems machinery to make use of its cache-heavy Power7+ processors, and as El Reg anticipated from the hints in the announcement invitation put out two weeks ago, Big Blue is starting near the top of the line as it upgrades systems that run AIX, IBM i (formerly known as OS/400), and Linux.
As has been the case for the past several generations, the rollout for the Power7+ chips will be a gradual one. "The rest of the products will get the Power7+ next year, with the exception of the Power 795," Steve Sibley, director of worldwide product management for IBM's Power Systems division, tells El Reg. "Just like with the Power 595, we already built the fastest processor and I/O into that machine."
It's tough to argue with the guy in charge of the product line – but it's not impossible. Even if IBM can't crank up the clock speed of the 3.7GHz and 4GHz processors used in the high-end, 32-socket Power 795 machine,, the ability to have processors with 2.5 times the L3 cache per core (at 10MB) and better sleep states and Turbo Core modes would no doubt be of use to more than a few Power 795 shops.
If enough customers ask for such a thing, you can bet IBM will sell 'em. Just because Big Blue didn't do it before doesn't mean it can't do it now.
The eight-core Power7+ processor was previewed at the end of August at the Hot Chips 24 chippery fest, and we gave you a peek into its expected performance in the wake of the tech presentation, along with some thoughts on the overclocking potential for the Power7+ chip.
As already divulged, the Power7+ chip will come in one variant that puts a single chip running at a higher clock speed into a socket, and another one that takes two Power7+ chips and crams them into a single socket to double-up the cores, threads, and L3 cache in a socket with what we assume will be a pretty substantial hit to clock speed. IBM calls a regular chip a single-chip module, or SCM, and the double-stuffer a dual-chip module, or DCM.
Die shot of the Power7+ chip from IBM
The Power 770+ and Power 780+ machines announced on Wednesday are based on SCMs, just like the current Power 770 and 780 machines, known as Power7' (Power7 prime) in the internal IBM lingo because these chips were announced in conjunction with a doubling of memory capacity and a shift to PCI-Express 2.0 peripheral slots in these machines in October 2011.
The Power7+ chip has a lot of new features to help accelerate specific functions inside of Power System boxes, including on-chip memory compression, encryption, and hashing algorithms, as well as a random-number generator that cannot be predicted because it is based on random electronic effects on the chip.
The Power7+ chip is implemented in a 32-nanometer process. Specifically, IBM's wafer bakery in East Fishkill, New York, uses a copper/silicon-on insulator process with high-k metal gates to etch the Power7+ chips, which have 2.1 billion transistors on the die.
Like the Power7 chips before it and the System z mainframe processors, the z11 and z12, the Power7+ implements shared L3 cache using embedded DRAM (eDRAM) instead of the faster static RAM (SRAM). It takes fewer transistors to make a memory cell for eDRAM, so even if it is slower than SRAM, you can jam a lot more cache right next to the processors and thereby speed up the performance of the overall processor by more than you might expect.
The shrink from 45 to 32 nanometers allows Big Blue to put 80MB of L3 cache on the die, plus a slew of accelerators. IBM says to make the Power7+ chip using SRAM for the L3 could have pushed the transistor count up to 5.4 billion, and the resulting chip would also be larger and therefore very likely getting lower yields on a new process.
In general, Sibley says that the Power7+ processors will deliver about 20 to 30 per cent more performance in the machines in which it will be soon shipping. But considering all of the accelerators, the expanded cache, and the memory compression for AIX (but not for Linux or IBM i) on the chips, customers would be wise to get some capacity planning help from IBM to figure out how their own applications might benefit as they jump from Power5, Power6, or Power7 chips to the new Power7+ chips in Power 770+ or Power 780+ systems. This is particularly true if you are using software-based encryption on any operating system and memory compression for AIX workloads.
For example, if you're using the AIX memory compression that debuted with AIX 7.1 running on Power7 chips, you could get as much as 2X the usable main memory, but by using the two on-chip accelerators that IBM put on the Power7+ chip to run the proprietary compression algorithm for AIX memory compression, you can get up to 2.25X usable main memory and not have the overhead of running the compression algorithms on the Power7+ cores. You get the double benefit of more addressable main memory (4TB can look like 9TB) as well as lower CPU core overhead, allowing the central processors to do more work.
The accelerators, by the way, are in the uncore area of the Power7+ chips and shared by the cores.
Re: Smoke and Mirrors anyone?
"No SPEC CPU2006 benchmarks released on SPARC T4 because Oracle is focused on developing SPARC processors for systems to accelerate commercial software, not HPC, not games."
Why have they then been releasing numbers for every previous generation of the SPARC processors up to and including the T3 ? Oracle (who took over SUN's SPEC license's) has license nr. 6.
And other vendors have not been shy of giving the public the numbers they need. I remember machines like the HP N4000 with PA-RISC 8600 absolutely hammering for example the IBM M80 with RS64 processors.
Now the T4 was IMHO a great feat of engineering, it's basically the first real general usable processors of the T processors. And if you've ever read for example the Hotchips presentations of the Processor, then you'll see that the Oracle processor people actually use relative specint/fp to relate the T4 to the T3.
Lets just quote one of the slides:
• Estimate ~5X S2’s SPECint2006* performance
•Estimate ~7X S2’s SPECfp2006* performance
•~2X S2’s per thread throughput performance
The real clue about why Oracle haven't released much numbers on the T4 is the last sentence here... which implies that the overall chip throughput of the T4.. is roughly the same as the T3.
Again which is a great feat as Oracle managed to remove some of the stupidity of the T processor line, by allowing for relative good single threaded throughput in the T4.
Or .. well just look at slide 10.. here it is from the horses own mouth.. the throughput of the T4 is the same as the T3. So we are talking in the range of 666 on specintrate for a 4 socket system. Kind of a drag when the competitions lowest clocked 4 socket system is doing 1000+, and your own brand spanking new x86 systems are doing 700 .. on 2 sockets. The real problem here is that this does not fit into Larrys marketing machine.
Again the data is from the horses own mouth. What pisses people like me and others off is that that the lack of data makes my job harder. I have to write up the standards and strategy for my company's usage of SPARC, Itanium and Power systems. And the lack of data and facts makes my job harder. Or even worse will mean that the guys who has to use the standards I make will get their sizing data wrong.
Now we can agree upon criticising and putting down specCPU2006, it has become a shitty benchmark.. kind of broken some would say but it's there and people use it, even your own beloved Oracle uses it.
As for the world records.. have you been smoking mushrooms ?
Now do I really have to go through them all and show you how crap they are ?
Lets look at
SPARC T4-4 Server Sets First World Record on PeopleSoft HCM 9.1 Benchmark.
Again if you investigate the benchmark.. it's the only... ONLY 9.1 submission.
SPARC T4-4 Server Delivers Best Four-Processor Result on TPC-H Benchmark at 3 TB Scale Factor
It's the only 4 processor system who has a submission. So ofcause it's the fastest 4 processor system, is it the fastest ? No bloody way,
SPARC T4-4 Server with Sun FlashFire Technology Delivers Record Performance on PeopleSoft Enterprise Payroll 9.1
3 submissions... M5000, z10 mainframe and then the T4-4 (not really a big field to compete in)
Here they actually manage to beat a mainframe by roughly a factor of 2. But.. the mainframe have a virtual machine with 8 cores for the benchmark + 1 support processor, 24 GB of RAM and then a traditional disk storage system.
The T4-4 has 32 cores 256GB of RAM and flash disks.
So .. the native database of the application and the native language format, 3 times the cores, 10 times the RAM and flash disks and a bare metal installation. Geee... guess mainframes are kind of tough anyway ?
Oracle’s SPARC T4-4 Server with Oracle Database 11g Beats Itanium and POWER7-based Systems on TPC-H Benchmark at 1 TB Scale Factor
Again it is by far not the fastest result, not even using Oracle (which is an HP result).. so it's not a bloody record.
SPARC T4-Based Highly Scalable Solution Posts New World Record on SPECjEnterprise2010 Benchmark
This is actually the first record that they have.. on a little obscure spec benchmark with 29 submissions, they actually managed to win one. by throwing a shitload of hardware after this benchmark, as others also have stated here.
SPARC T4 Server Delivers Outstanding Performance on Oracle Business Intelligence Enterprise Edition 11g
To deliver outstanding performance does not make it a world record. And I've searched and searched.. and the only other machine I can find that have made this benchmark is the T5440.. So.. a benchmark only run by Oracle on Oracle hardware beats an older version of the machine... is this a world record ? Technically.... i guess it is ... but honestly ? You've gotta be kidding, how can they post statements like this and not feel ashamed ?
SPARC T4-2 Server Achieves World Record on Oracle E-Business Suite R12 Benchmark
Again in this particular category... the only ... submissions made.. are by Oracle. So again in a field where you are the only one that have submitted a result.. on a benchmark you make yourself.. you hold the world record... HOW NICE... get real.
SPARC T4-2 Server Achieves World Record Results on PeopleSoft Enterprise Financials 9.1 Benchmark
Again... the T4-2 is the only... ONLY server to ever submit a 9.1 benchmark.. there are some M series machines also Oracle on the version 9 of the benchmark.. but again.. in a field where you are the only one to participate you win and set a world record... it's ridicilous.. do you see the pattern ?
SPARC T4-2 Server Achieves Best Single-System JD Edwards EnterpriseOne Benchmark
This one is also funny.. cause they actually get creamed by an IBM iSeries machine again with internal disks versus flash, and 0.17 second response time versus "Sub second" for the T4.
Again Oracle manage to say that they win the benchmark by saying "78% more Users/rack unit than the IBM Power 770 server. " Man that made me chuckle... user per rack unit... *cackle*
SPARC T4 Servers Set World Record on Siebel CRM 18.104.22.168 Benchmark
Again here the results are few.. but a host of T4 systems with 1.5-2 times the processors (depending on the role in the benchmark) 1.5-2 times the memory, flash disk versus traditional disks, 7 times the response time manages to do 8000 more users than a setup of POWER servers, that does not run at full utilization. 20-80% depending on role in the benchmark.
It's really not impressive.. but sure you can call it a world record.. .but.. again.
SPARC T4-2 Server Tops Industry-Standard, General-Purpose Java Benchmark
This is SpecJVM2008.. it's a benchmark for ... ... PC's. The only real machine that you can compare the T4 benchmark results with is an ... 2009 Apple.. iMAC. And it does 50 with 1 chip and 2 cores, versus 450 for the T4-2 with 16 cores and a version of java that is 8 generations later than the iMac.
Sooooooooo... it's a world record that you beat a 3 year old iMac running an old java version with 10% per core ?
Do you know how ridiculous that makes your statement:
"Oracle has published over 14 #1 world record benchmarks on the SPARC T4"
Oracle Communications ASAP enables Service Activation of over 150 Million Mobile Subscribers on Oracle's Newest SPARC T4-2 Server
Ehh.. this is an internal Oracle benchmark.. it's very hard to dig up anything on this benchmark.. but one statement from a T3 "benchmark test" sprung to mind:
"Oracle used internally developed cryptography performance tests to measure performance."... come one.. this is not a industry benchmark.. it's.. a ridiculous claim.
SPARC T4-2 Servers Deliver New World Record on Oracle's JD Edwards EnterpriseOne Benchmark with Interactive and Batch Components
Here we have a host of 2xT4-2 and a T4-1 and a Flash array against a single lille 8 core IBM i POWER7 machine with some internal disks. I mean.. come on there isn't even a site where you can compare benchmark results. IT's x10 the RAM x5 the cores and flash storage versus internal SAS disks. Come on... it's laughable.
Again all these Oracle product benchmarks are not really industry standard benchmarks.. They are more POC benchmarks.. to show that the solution can be done.
Now do you understand why people think that Oracle is full of it ?
It is quite understandable, when you do the research, why people think Oracle is full of it.
Re: Smoke and Mirrors anyone?
Again.. you really don't have a clue do you ?
1) Naaahh.. you just do a RTFM.
2) Naaah... If you would have made yourself the trouble of reading the something as simple as the spec sheets, you would know be able to see that the POWER 780 machine now is a 4 socket per system unit only machine. Hence you really should compare it to the previous version of that model. That model had 4-16 socket 6 core per chip POWER 780 that ran at 3.44 GHz.
The new machine does 128 cores at 3.7 GHz in 16 sockets. Or you can have a 64 core version with 4 cores per chip, that runs at 4.42 GHz. So it's 96 cores at 3.44 GHz versus 128 cores at 3.7GHz or 32 cores at 4.14 GHz versus 64 cores at 4.42 GHz.
Now that is quite an improvement.
3) It's business as usual, this is how they always do it.
4) Read 2.
5) Do you have trouble reading ? 12,560,858SpecJBB2005@128 Cores and 6130SpecIntRate2006@128 Cores for the POWER 780 and 57024@SAP users for SAP 2-Tier@96 Cores. ... On these benchmarks it's only surpassed by the POWER 795 and an the SGI Altix. Now that is kind of a feat for such a server.
Not to mention some LINPACK numbers that are insane.. x1500 better than last version, guess some of the accelerators kind of meant that IBM broke/cracked that benchmark.
6) Come on.. There have been released 8 industry standard benchmarks results on the POWER 780 on the first day.. that is a factor of 2 of the 4 that have ever.. been released on the T4-4. *Bleh*
7) You don't really get the IO system of the POWER server do you ? Again your "Oracle only" view limits your comprehension of other .. better... solutions.
Forget the speed!
Just look at the "stealth"door on the 780 cabinet. Those angles.
Bet if you look in the data hall with radar, that looks just like an iPad