Good luck, have fun: Thanks Xeon SP, now SPEC benchmarks blurt out hundreds of results

We're supposed to sift through this without a tool?

A broken camera
You can't see through a broken lens

Comment Ah, the good old days of server benchmarks, when CPU variations were few and clock rates ruled the roost. Now we have four different Xeon SP CPU families, one with two model lines, and each family has its own set of CPUs with differing core counts, threads and clock rates.

Vendors put these in either blade or rackmount servers with varying socket counts. A benchmark is supposed to help you, dear user, work out which server gives you the best performance/dollar for your workload needs. Some industry standard test, from an organisation like SPEC, gives each server of interest a performance index number, and you know that server A is better than server B because it has a higher index number.

Simple!

The SPEC organisation's latest CPU2017 benchmark has integer and floating-point variations, each with base (standard compiler) and peak (tweaked compiler) results, and each with a job time (SPECspeed) and throughput (SPECrate) version – that's 18 individual benchmark components.

It also has base and peak energy levels but server vendors typically don't bother to use them, thank the good Lord.

The CPU2017 reported results are openly available here and will get you tearing your hair out in record time. It is just so damnably difficult to work out which servers are best suited to a job.

Take a look at a section of the SPEC CPU2017 integer speed results summary listing:

SPEC_CPU2017_650

Click to embiggen

We see a mix of blade (UCS B200) and rackmount (UCS C480) Cisco and rackmount (R940) Dell servers, with a mix of Xeon processor brands, model numbers and clock rates. Moving right we see the base thread count after a Parallel column, the enabled cores, chips and threads/core, then the base and peak results, and empty base and peak energy result columns.

These summary entries are not dated, nor are they priced. To get dating, detailed configuration and test component result info, you have to go to the individual records, which are available in HTML, CSV, text, PDF, PS (printable) formats.

Here's an extract of one for a Lenovo system:

SAmple_SPEC_CPU2017_Entry_PS_Listing_650

Sample of nine-page Lenovo server PS listing. Click to enlarge

Back in the summary results list, there's no apparent ordering, except by supplier name, so how the heck do you make sense out of it? The entries within a supplier's section aren't ordered by Xeon processor brand, or clock rate and/or core count within brand, or by base or peak result. It's just a great big results brain dump.

Let's suppose some server config whizz has said you need a 2-socket rackmount server with a Xeon Gold 6100-type processor for your workload. Which suppliers have benchmarked these in the SPEC list?

You have to page through 408 entries, looking for 2-socket (remember, SPEC calls sockets "chips"), Xeon Gold 6100-class rackmount servers – Cisco, Dell, HPE, Huawei, Lenovo, Sugon and Supermicro ones are there to be found.

That's right, you spotted it, there are no entries for Hitachi Vantara or Fujitsu. As well as being bewildering to interpret, the list is incomplete.

It also has a trap for the unwary. Yes, you can find a 2-socket Xeon Gold 6100-class Cisco UCS B200 M5 server but it's no use; that's a blade-chassis server, not a rackmount. You also have to know the different suppliers' nomenclature to get at the type of server you need.

To see how hard this all is, we looked for 2/4-socket rackmount Xeon Platinum 8180 servers with 112 threads, and pored through the list to find some from Cisco, Dell, HPE, Huawei and Supermicro.

We entered these in our own spreadsheet – how stupid is that? – and then charted the base result number for each server and, at last, came up with a easy-to-visualise graphic showing how the servers compared:

112_thread_SPEC_Server_chart

It looks usable and immediately the Huawei product stands out. We notice the winning Huawei 1288H has only two sockets; while it has 112 base threads there are just 56 cores, yet it is the highest-scoring system. For Huawei threads outweigh cores, which is unexpected.

There is gold to be mined from this SPEC benchmark spoil heap but why oh why can't the SPEC organisation provide a sensible on-screen tool for extracting relevant results from its summary list? How are we ordinary people meant to use these 400+ entries as a help in server selection when their presentation is so abysmal?

For sure, the SPEC organisation and the server suppliers have their task immeasurably complicated by Intel's asinine and manic over-production of Xeon SP variants, yet they themselves collectively don't help us select and compare their server products.

We need a smartphone-style app to select, extract and compare servers on a performance benchmark. It's not big data, doesn't need AI-driven analytics, and would do the world of server buyers a gigantic favour. ®

Sponsored: Minds Mastering Machines - Call for papers now open


Biting the hand that feeds IT © 1998–2018