IBM's Power6 slaughters world+HP in transaction cranking
5.0GHz and $17m run to glory
When IBM started rolling out 5.0GHz versions of Power6, you knew it was only a matter of time before the vendor tried to usurp HP as the transaction performance king. And now it has done it.
IBM this week cleared a new TPC-C score that certifies its Power 595 server as the big daddy of transactions. Running on 32 of the dual-core 5.0GHz chips, the box clocked a TPC-C score of 6,085,166 transactions per minute (tpmC) at $2.81 per tpmC. That blows out a 64-processor (128 cores) Itanium system from HP that hit 4,092,799 transactions at $2.93 per tpmC.
Reality, as many of you know, is not the friend of these types of benchmarks. So, we'll note that the IBM set-up used in the benchmark cost $17m and that the system is not slated to be available until December, according to the TPC-C web site. (IBM, however, says that Power 595s with 32 chips are shipping now.)
That said, the score confirms what everyone expected, which is that IBM's Power6 chips can crank the hell out of business transactions.
"The performance of the Power 595 enables customers to replace three 128 core HP Superdomes (384 cores spanning six computer racks) with two 64 core Power 595 servers (128 cores spanning just two computer racks), reducing the number of processor cores by 66 percent, saving 20 percent on energy costs and 55 percent on software licensing purchased by the core, and reducing floor space by 59 percent," IBM boasted.
HP had owned the TPC-C crown since February 2007 and will certainly submit a new benchmark score once it gets the four-core Tukwila flavor of Itanium later this year. It'll be interesting to see if Tukwila is up to the expected public relations task of besting the 5.0GHz Power6. ®
The result is surprisingly low actually
TPC-C scores usually scale quite well with memory. IBM have gone from 2TB of RAM to 4TB but only managed a 50% performance improvement. I guess they've hit some architectural limitation with the 595 and need a new box here to keep there shiny new chips fed.
TPC-C is a silly benchmark these days. The workload distribution is static. In a quad/cell based system the clients can be distributed across the quads and localised. There is little or no inter quad traffic. So it scales in these boxes like no real world commercial DP application is ever likely to. It's like when they used to run clustered TPC-C scores, if you had a 4 way cluster 1/4 of your clients connected to each of the 4 modes and then only handled data in a 1/4 of the database so perfect scaling. The same thing is happening in modern large MP systems. If you could do with you database what the bench marketing engineers do, you wouldn't need the big iron at all. You'd just cut the database in a hundreds of little bits and then run them on mini blades Where as usually companies buy big iron coz they've got a damn great big database and a whole load of access to it is needed.
Yeah, I mistyped that...I meant to say Moto to PPC to Intel or something, and just bloody mistyped. Anyway, was well aware of the whole PPC thing, especially as I used to do a lot of work with IBM SP2s and was amazed that Apple got onboard the same technology.
But I would debate that an architeture isn't more than an instruction set. That instruction set makes a whole lot of assumptions about the registers, hardware security mechanisms, addressing modes, memory tiering, pipelining, arithmetic units, etc. Sure, you can make hardware that does not meet those assumptions - but the performance will be a failure (that's what emulation does, of course).
In the end analysis, IMHO the PPC architecture is geared for faster clockspeeds, with all the attendant power use and heat generation, even if they CAN clock it slower and make a laptop part out of it. The Pentium-M and Core are optimized for slower hardware clock cycles. While the blazing numbers on the PPC are highly interesting, I suspect that future computing increases will look a lot more like Nvidia's CUDA and IBM's Cell/PPC pairings, especially SPMD (single program multiple data) architectures.
Herby you're WAY off base
Herby, making a statement like and I quote, "The price-performance will always be better on X86 architectures, due to the volume and thus pricing of these products. Moving to AIX/Power from HP_UX/Itanium is really like jumping from the fire into the frying pan." is like saying a Ferrari Enzo is a great buy because the tires are cheap. You're looking at one piece of the total cost of ownership--hardware. By the way, labor is the most expensive piece of any i/t shop, not hardware. So if you ask any third line manager or above if they'd rather spend big bucks on a big machine or save money on the hardware and have 3 more head count you can guess which one they'll pick.
Pricing of the hardware is not the only consideration in purchasing a solution. X86 architectures typically require a LOT more personnel to admin as they are usually many small boxes all with their own firmware, os's, and apps that need patching not to mention the complex software like RAC required to get them to move any significant amount of data. Did we get to software licensing yet? By the way, its by processor and yes it takes a lot of dumpy x86 procs to equal one RISC proc. Larger servers when looked at across a Total Cost of Ownership are MUCH less costly in the long run.
Not to mention you make no case for what type of availability the customer requires. I hope you're not expecting your 100 machine x86 cluster to have any type of uptime.
Herby, stick to macs and playing games on pc's. Your experience in the arena of business is obviously lacking or in a VERY small business.