In-depth: Supermicro's youngest Twin is a real silent ice maiden

Don't look now, folks, Trevor's in luuurve

Fri 27 Mar 2015 // 18:31 UTC

A good fit for hyperconvergence

Perhaps the best example of this is pushing Maxta's MxSP hyperconverged software to the limit. I have now used Maxta with the same set of physical disks and SSDs on 4 different 4 node clusters, and all show very different results.

The oldest cluster, Rubidium uses Intel 2680 V1 Xeons. Potassium uses 2603 v2 Xeons. Caesium uses 2609 v2 Xeons and then there is the 2680 v3 Grantley cluster I'm reviewing now. With the exception of the Grantley cluster, all the other clusters are 128GB RAM per node.

The disk loadout used for cross testing MxSP across all clusters is four Toshiba AL13SEB300 300GB 10K SAS drives per node and two Micron M500DC 480GB SSDs. The SSDs in question have a theoretical 4K random read of 65,000 IOPS (60,000 practical measured) and random write IOPS of 35,000 (30,000 practical measured.) Based on my knowledge of how MxSP works, at the absolute best I should be able to get a practical write IOPS of 240,000 per cluster and read IOPS of 480,000. (MxSP writes a copy of the data to two nodes, so take your total cumulative SSD IOPS and cut it in half.)

For all tests I outfitted each node with dual 10GbE NICs. Rubidium's ageing 2680 v1 CPUs allow me to push about 150,000 write IOPS, and pull an impressive 300K read IOPS. Unfortunately, this comes at the cost of almost the total processing power of 4 cores per node. This is due to the CPU power required to handle the data efficiency (deduplication, compression, thin provisioning, etc) that is built into MxSP. I could turn it off, but where's the fun in that?

Potassium's anaemic 2603 v2 Xeons are shite. I can almost – almost – eke out 20,000 IOPS in either direction, but this is flattening four cores to achieve it, which means eating an entire socket per node. Caesium's 2609 v2 chips fare a little better; I see 40,000 IOPS across the cluster, but again, at the cost of an entire socket's worth of cores per node. But lo! Grantley and DDR4 do make a difference! I get the same 150,000 write and 300,000 read IOPS off of the 2680 v3 Xeons as I do off the 2680 V1, but I can do so using only two cores.

Those CPU details in full. Click for non-cropped, full-size version

Considering that the nodes have 12 cores per socket, that's an entirely acceptable overhead in order to get inline deduplication and compression on that kind of I/O load. Now, of course, real world numbers are lower. I know MxSP pretty well, so I can align my workloads such that I am getting the maximum possible IOPS from a cluster. It helps to understand how workload mirroring across nodes works. The variability in IOPS across the CPUs shows what a difference the CPU generation (and clock speed) can make.

Topics

Special Features

Vendor Voice

Resources

Systems

In-depth: Supermicro's youngest Twin is a real silent ice maiden

Don't look now, folks, Trevor's in luuurve

A good fit for hyperconvergence

More about

More about

More about

More about

More about

TIP US OFF

Other stories you might like

Supermicro CEO predicts 20 percent of datacenters will adopt liquid cooling

What supply chain crisis? Supermicro lifts rack-scale system production

Supermicro bets on next-gen chips to carry it through economic downturn

Protecting distributed branch office environments from ransomware

Supermicro pulls in a strong quarter thanks to rack-scale demand

Supermicro CEO would like it if you could all build new, greener datacenters

Intel is running rings around AMD and Arm at the edge

Ready for testing: First-ever supercomputer powered by Intel's wildcard AI chips

About Us

Our Websites

Your Privacy