TPC adds power suckage to benchmarks
Performance (per watt) anxiety
If server makers are already anxious about how big their iron is, they'll now also need to start worrying about how cool they are.
The Transaction Processing Council is a consortium of server, operating system, and database software makers that steers the development, running, auditing, and reporting of a suite of online transaction processing and data warehousing benchmark tests. The results of these tests are used for one-upmanship by vendors and as part of purchasing decisions by IT departments, and now TPC is adding another set of metrics for them to take into consideration: energy.
The TPC-C OLTP test, which has been in use since 1992 and which is in its fifth revision, is arguably the most popular system-level benchmark test in the history of the computer industry. The TPC-C test is a collection of different workloads that are associated with the data processing needed to run a warehouse - a physical warehouse with forklifts and pallets, not one crammed with data being tickled by SQL. The workload includes order processing, inventory, and other operations, and the TPC-C metric is the number of new orders a database can process per minute while supporting the other work.
Like All TPC tests, all of the hardware and software used in the TPC-C test has to be itemized, and vendors have to provide list prices for all of the components as well as discounts that a typical customer would get. The results have to be audited by experts certified by the TPC, and both performance and price/performance metrics are required.
The more modern TPC-E test, which was launched in March 2007, simulates the data processing of an online stock brokerage, and uses real data along with customers that are simulated based on census data from the United States and Canada.
The TPC-E test was designed to be easier to implement, but harder to game. And as you might expect, it has not exactly been popular with server and systems software makers - even though they designed it by committee starting in 2005 and were expected to ratify it in 2005. Only 29 systems in three years have been tested, which makes the TPC-E test basically useless.
The other test that is getting an energy component is the TPC-H data warehousing test, which tests how well or poorly a system or a cluster of systems can process ad hoc queries.
The TPC-Energy spec is a an optional component of these three tests, not a requirement, says Mike Nikolaiev, who is chairman of the committee that drafted the spec and who gets his paycheck as the manager of the systems performance group at Hewlett-Packard. While the Standard Performance Evaluation Corporation has a larger family of benchmarks, a number of which measure server performance and a few which have an energy component, the three TPC tests and their energy metric overlay are distinct in that they require pricing on the systems and independent auditing.
"We want to make sure we have a level playing field here," Nikolaiev said. And in a possible good sign, the 22 members of the consortium who were around to vote on the spec (there are 24 members in total, including all the key server and operating system/database players) unanimously ratified the spec. "That has never happened with a TPC benchmark before," according to Nikolaiev.
The TPC-Energy spec shows vendors how they need to attach power meters to the systems under test, and it will not only look at the electricity consumed by the entire system under test, but also examine the idle power of the same system when it is not processing transactions but kept in a state of being able to process the first transaction.
To keep vendors from gaming the test, all elements of the system under test (including any funky cooling elements) have to be commercially available, and vendors have to measure air intake on the system racks and keep an ambient intake temperature of 20 degrees Celsius - no super-chilling the data center to allow a machine to do more work per watt.
Energy use is measured in three parts of the system - the application servers, the database servers, and the storage systems - so vendors can show the relative efficiency of different components. The TPC has also come up with a software toolset called the Energy Measuring System that all system testers will use to monitor energy usage and collect data during the test. This means the collection of energy data will be absolutely consistent across different vendors.
Nikolaiev is hopeful that the TPC-Energy overlay to the TPC tests will be popular, and with vendors looking for every angle they can find to peddle systems, it seems reasonable that a different kind of arms race will start based on performance per watt instead of just performance.
"The main vendors will jump in right away," says Nikolaiev. "And this will create a lot of peer pressure."
The lack of peer pressure and the closing of loopholes in the TPC-E test has doomed that better benchmark to the back benches after an initial boost of enthusiastic jawboning from the server makers. HP is going to be putting out TPC-Energy specs soon, and Nikolaiev says others will start rolling out results in the next couple of months. He expects server makers to run double tests, pitting solid state disks against disk drives within the same server and running the same software stack to show why SSDs are worth all that extra money. No doubt the first tests will come on systems using new x64, Itanium, and Power7 processors that that are expected in February and March. ®
Well, looking at http://tpc.org/tpce/results/tpce_perf_results.asp it would appear TPC-E favours x86 - where's Sun T-series or IBM Power? If IBM passes on TPC-E when POWER7 gear hits the streets then I'll elevate my intuition to theory. :-)
The one thing Sun did with TPC-C is forever change the storage foot print used by the vendors. IBM will probably want to deliver a smack down to Oracle with POWER7 and DB2 pureScale but clearly can't afford to do so with >10000 disk spindles. Surely the smack down will leverage IBM SSD kit.
The one thing I really like about TPC-C (say what you will about its applicability in the real world or its simplicity) is that it truly is a real full system stress. That is to say, there are reads *and* writes and unlike SPEC you need disk drives. And apparently to keep up with the I/O bandwidth of large PCIe slot machines with high counts of 8Gb FC cards you need an unreasonable number of disk drives (unreasonable for the real world anyhow).
Let me get this right.
Because the E test was accurate and couldn't be gamed, the server manufacturers won't use it.
Death of hard drives...
For a benchmark aimed at testing servers and databases, TPC-C has increasingly been a test of major supplier's abilities to marshall huge farms of disk drives as TPC-C has a wholly untypical balance of server versus I/O resource usage. Until recently we have see (at the top end) vendors employing upwards of 10,000 enterprise disk drives. For this reason the storage hardware costs have tended to dwarf the server costs at the top end. That's very probably true of power consumption too - some of those storage configs must have been consuming close to a quarter of a megawatt.
The most remarkable thing about the current top-end TPC-C benchmark (from SUN) is that it included a vast amount of flash memory. Of the total database server and storage hardware costs (discounted) of about $10m two-thirds was down to the flash modules (about 80% was storage hardware in general). That was supplemented by a few hundred slow, 1TB driuves.
In contrast IBM's TPC-C (second on the list) had almost 11,000 15K drives - imagine the power consumption of that lot.
More attention to power per unit throughput is surely going mean that these mega-farms of spinning disks will be replaced in top-end TPC-C benchmarks with either full SSD or hybrid storage arrangements (as SUN have done). Reputedly the SUN configuration used less than 25% of the power of the IBM benchmark,