Original URL: http://www.theregister.co.uk/2012/05/04/is_tpc_ds_worthy_of_weaponising/

Behold the TPC-DS, a new weapon in the global Big Data war

Seeking the 'game-proof' benchmark

By Dan Olds, Gabriel Consulting

Posted in HPC, 4th May 2012 09:46 GMT

There isn’t anything inherently evil about industry standard benchmarks, just as there isn’t anything inherently evil about guns. You know the saying: “Guns don’t kill people – people kill people.” (What about bullets? No, they’re not inherently evil either.)

But in the hands of motivated vendors, benchmarks are weapons to be wielded against competitors with great gusto. So why am I writing about benchmarks? It’s because the Transaction Processing Council (TPC) has released a new major benchmark, TPC-DS, which aims to provide a level playing field for vendors warring over their Big Data prowess.

El Reg's Timothy Prickett Morgan wrote about this new benchmark at length here. In his inimitable way, Tim captured the meat of the story, plus the marrow, gristle, hooves and hide of it too.

I talked with TPC reps last week about TPC-DS and benchmarks in general – learning quite a bit along the way about how TPC-DS is different (and better) than what we’ve seen before. In my years on the vendor side of the industry, there was nothing better than using a shiny new TPC-C or TPC-D score to smite the living hell out the server competition. But my efforts to learn enough to explain the results in customer presentations taught me bit about how vendors perform benchmarks. It’s like learning how sausage is made, but with more acronyms.

There were lots and lots of ways to put together stellar benchmark results on system configurations that a real customer would never buy, with software optimised in ways that won’t work in production environments, all offered at prices they’ll never see.

But benchmarks are a necessary evil, like the Irish and the Dutch. There are differences between systems and the ways vendors implement similar or even identical technology. Two Xeon processors might perform exactly the same when they roll out of Intel’s fab. But after they’ve been incorporated into various ‘industry standard’ systems, get their software loads and start running apps, there will be differences in performance and certainly price/performance.

In a perfect world, customers would be able to do ‘bake-offs’ or ‘try and buy’ exercises before issuing a PO. But these are time-consuming and expensive for both the vendor and customer. It’s worth the effort for a very large or mission-critical installation, but not so much when the deal size and importance is lesser. This is where standard benchmarks come in: they give buyers a few more data points to use in their decision making process.

Benchmark organisations like TPC and SPEC (Standard Performance Evaluation Council) arose from industry need for standardized and valid comparisons between systems and solutions. These consortiums are primarily vendor-funded, and vendors certainly play a role in helping shape the benchmarks.

If a benchmark's results are printed in a forest and no one reads them, does it exist?

Building standard benchmarks isn’t easy. Tests need to stress the right parts of the system/solution – mimicking real-world processing as much as possible. They need to be complex enough to stress even the largest system, but at the same time scalable and reasonably easy and inexpensive to run. Safeguards have to be built in so that vendors can’t rig the results or game the tests. And, finally, the rules need to be documented and enforced.

But for a benchmark to be truly successful, it has to be used by vendors and customers alike.

What jumped out during my briefing with TPC was the size and complexity of TPC-DS. Lots of tables, lots of data, and lots of operations. It’s much more complete and complex than TPC-H, with 99 queries compared to only 22 for the venerable H.

One of the newest wrinkles with DS is that the ‘ad hoc’ queries are truly ad hoc. In TPC-D and, to a lesser extent, TPC-H, ad hoc queries could be anticipated and planned for by canny benchmark engineers. With DS, the ad hoc queries are randomized, and there are too many potential permutations to allow for pre-positioning data accurately.

The TPC-DS ‘number’ is a bit complicated to understand – it’s a combination of the single user time to run through 99 queries, multiple users running through 99 queries, and database load times. A valid TPC-DS run has to have the single user run results plus a multi-user run with at least four streams.

There isn’t a maximum number of streams (which is the same as users) that need to be tested. There also is no minimum response time goal that needs to be satisfied in the multi-user runs, like in other benchmarks. The metric is heavily influenced by the multi-user results, rewarding scalability and optimised use of shared resources.

Given that there aren’t maximum response time rules, vendors will end up doing lots of TPC-DS multi-user runs in order to find the sweet spot where they’re supporting a large number of users and achieving their best overall score. The executive summary of the mandatory benchmark disclosure document shows the numbers that went into the overall metric, including number of users supported, response times, and the like.

To me, TPC-DS is a big leap forward. I’m no benchmarking expert, and nothing is totally foolproof, but this looks to be a highly ‘game-proof’ benchmark. It also does a much better job of capturing the complexity of modern business processing to a much greater extent than its predecessors.

From a customer perspective, it’s definitely worth consulting TPC-DS before making that next Big Data solution buy. Now we wait and see if the vendors are going to run it through some gear – and thus weaponise it. ®