New standard test of Big Data bang per system buck rolls out
A sim so good you could use it instead of Oracle or SAP?
There's a new big data benchmark in town: TPC-DS.
The Transaction Processing Performance Council still doesn't know how to do its own abbreviation after 24 years of existence, but it does know a thing or two about getting IT hardware and software vendors together and hammering out benchmark tests and pricing metrics to help server, storage, database, and middleware buyers try to figure out what they might want to buy and what kind of value they might expect from what they buy.
The TPC was founded in 1988 following an uproar in the server racket after IBM ran its own RAMP-C COBOL benchmark test on its AS/400 and System/38 minicomputers, pitting them against a bunch of Hewlett-Packard HP 3000s and Digital Equipment VAXes and showing (of course) that the IBM machines won out.
The initial TPC-A debit/credit benchmark that these and other vendors ratified through the TPC consortium was ridiculously simple by modern standards, but then again, so were the things that we were all doing with systems back then.
The TPC-C online transaction processing benchmark, which simulates the data processing associated with running a warehouse (the real kind, with forklifts) and looking up stock items and doing other transactions, is arguably the most successful comparative benchmark in history (certainly among those that provide both performance and pricing), but is getting a bit long in the tooth considering that the whole benchmark test can easily fit in main memory these days and the disk requirements of the TPC-C test are utterly ridiculous.
The TPC-E test , which simulates the data processing of an online brokerage and that supports multi-tier configurations, was supposed to be a replacement for TPC-C when it debuted in March 2007, but relatively few TPC-E benchmark test runs have been done in the past five years – El Reg counts 55 machines, and they are all running the Windows stack – so the usefulness of TPC-E can be seriously called into question.
On the decision support/data warehousing front, TPC-D was the original benchmark, but fighting among the vendors in the TPC consortium caused it to be split into the TPC-H and TPC-R. Basically, some companies were precompiling routines in the ad hoc query benchmark, that was eventually allowed in the TPC-R test, which vendors and users alike eventually shunned as being useless.
Work on a follow-on to TPC-H was started back in September 2004, when the TPC-D, TPCH-H, and TPC-R tests were all active, and the idea then was to get back to a single test. The hope back then was to get TPC-DS ratified in late 2005 and into production in 2006. Clearly, this took a lot longer than expected.
And the TPC-DS test is now no longer being pitched as a kicker to the TPC-H ad hoc query test for data warehouses, but rather as a totally different test that simulates the big data processing associated with a modern retail operation. And thus, you might think, perhaps the TPC consortium should have called it TPC-BG.
What customers have asked for TPC to come up with was a test that can measure the performance of a single user (what it calls a power test), the performance of many users (a throughput test) and continuous data integration (extract, transform, and load, or ETL, work as well as trickle updates to the database).
The TPC says that the TPC-DS test has realistic table content and table scaling, has non-uniform data, and includes NULL values, which represent real-world database challenges. Furthermore, it has a large de-normalized schema, a large query set, complex queries (including tool-generated queries and modern SQL constructs such as SQL99 and OLAP constructs), and a mix of ad-hoc and reporting queries.
You can see the TPC-DS standard specification here  (PDF).
The TPC-DS test models the decision support processing for a hypothetical retailer that has to manage a large number of products and sells its products through a nationwide chain of stores as well as having catalog sales and online sales.
The mix of sales is 50 per cent in brick and mortar shops, 30 per cent through catalogs, and 20 per cent through the online store. The simulated system tracks product inventories and ships them from simulated warehouses. It also records purchases, modifies prices according to promotions, creates dynamic web pages, and updates customer profiles. (If you are looking to start up a retail operation, the TPC-DS code might be cheaper than paying Oracle or SAP ...)
Here's the block diagram of the TPC-DS system:
TPC-DS benchmark block diagram
The simulated application has sales, inventory, shipping, planning, marketing, fraud analysis, and customer analysis modules, and the application includes both transaction databases and a data warehouse. In essence, it is a mix of TPC-C and TPC-H. The data warehouse is de-normalized with multiple snowflake schemas being used to create what is called a snowstorm schema. This setup can support the generation of batch reports, ad hoc queries, iterative OLAP queries, and data mining.
It is complicated and hairy – just like real-world decision support systems. The databases include 26 tables, including 7 fact tables (which hold 99 per cent of the total data in the system) and 19 dimension tables that glue them all together for the various types of queries. The tables have 30 or more columns and have a variety of character, decimal, and integer data and are, of course, indexed like hell. But complex data structures like materialized views, bitmaps, and join indexes are only allowed on the catalog sales channel.
Just like the TPC-H test, the benchmark will scale not just by the size of systems but also by the size of the dataset in the fact tables that is chewed upon - with variants coming in 100GB, 300GB, 1TB, 3TB, 10TB, 30TB, and 100TB sizes. The static tables don't scale, just like in the real world.
The TPC-H test adhered to the SQL92 standard and had 22 queries, which were characterized as simple ad-hoc queries. The TPC-DS test has 99 queries, which are a mix of ad-hoc and reporting queries and they adhere to the more modern SQL99 standard plus thrown in some OLAP extensions that are commonly part of relational databases today. And the queries are, of course, a lot more complex.
The TPC-H test had 8 tables, and only 2 of them were updated, while 22 out of the 26 tables in the TPC-DS test are updated. The TPC-H test had data deleted and inserted randomly to simulate change in the database, but TPC-DS actually does rolling updates of data like a real workload would. The largest TPC-H table had 15 columns, and the largest TPC-DS table has 38 columns.
Here's the algorithm for coming up for the composite TPC-DS benchmark score:
The TPC-DS performance metric
Scale factor is the dataset size, and the metric counts the throughput in queries per hour. You load up the databases, which does the updates to the dimensional and fact tables and time that. Then you do one user's query stream against it to do the power test. The you do two runs of the multiple query test, where streams of multiple users add queries. Each user stream executes all 99 queries in a random order and only executes one query at a time.
You load up user streams to boost the overall throughput of the system. You do some normalization for user counts and come up with the final metric in queries per hour. You divide by the cost of the system under test (which does not appear to include maintenance) and there's your TPC-DS measured amount of bang per buck.
Vendors can start using TPC-DS immediately. It will be interesting to see if they do. ®