Original URL: https://www.theregister.com/2012/06/20/mellanox_infiniband_ethernet_isc/

Mellanox FDR InfiniBand pushes PCI-Express 3.0 to the limits

10GE, 40GE can't keep up with IB

By Timothy Prickett Morgan

Posted in HPC, 20th June 2012 00:25 GMT

ISC 2012 If you want to try to choke a PCI-Express 3.0 peripheral slot, you have to bring a fire hose. And that is precisely what InfiniBand and Ethernet switch and adapter card maker Mellanox Technology has done with a new Connect-IB server adapter.

Mellanox was on hand at the International Super Computing event in Hamburg, Germany this week, showing off its latest 56Gb/sec FDR InfiniBand wares and boasting of the uptake in InfiniBand technology in the Top 500 rankings of supercomputers and its general uptake in database cluster, data analytics, clustered storage arrays, and other segments of the systems racket.

Mellanox is the dominant supplier now that QLogic has sold off its InfiniBand biz to Intel, and it is milking the fact that it has FDR switches and adapters in the field when QLogic is still at 40Gb/sec QDR InfiniBand. (QLogic, if it had not been eaten by Intel, would counter that it can get the same or better performance from its QDR gear than Mellanox delivers with its FDR gear.) These are good days for Mellanox, which ate rival Voltaire to get into the Ethernet racket and which is enjoying the benefits of the rise of high-speed clusters.

At least until Intel comes back at Mellanox in a big way, pursuing all of its own OEM partners with the Xeon-QLogic-Fulcrum-Cray Aries quadruple whammy. Intel did not buy QLogic, Fulcrum Microsystems, and the Cray supercomputer interconnect business to sit on these assets, like some kind of knickknacks sitting on shelf.

Intel is going to try to become a supplier of supercomputing interconnects that do all kinds of things and that hook into its Xeon processors and chipsets tightly and seamlessly, and that will eventually make it very tough for Mellanox.

But not so at ISC this year. As El Reg previously reported, for the first time in the history of the Top 500 rankings of supercomputers, InfiniBand has edged out Ethernet, with 208 machines using InfiniBand and 207 using Ethernet. Drilling down into the data a bit, there were 195 machines that used Gigabit Ethernet switches and adapters to link server nodes together, and another 12 that used 10 Gigabit Ethernet.

There are still 78 machines on the list that use earlier InfiniBand gear, but there are 110 machines using QDR InfiniBand, and 20 machines that use FDR InfiniBand. There are a few hybrid interconnects as well on the list that mix InfiniBand with some other network.

The remainder are a mix of custom interconnects like the Cray "SeaStar" XT and "Gemini" XE routers, the Silicon Graphics NUMAlink, IBM's BlueGene/Q, Fujitsu's "Tofu," and a few others. Gigabit Ethernet is by far the most popular of any single speed or type, of course, but it is dramatic how InfiniBand has really blunted the uptake of 10GE networks at the top end of supercomputer clusters. The idea seems to be that if you are going to spend money on anything faster than Gigabit Ethernet, then you might as well skip 10GE or even 40GE and get the benefits of QDR or FDR InfiniBand.

This is certainly what Mellanox is hoping customers do, and that is why it is bragging about a new server adapter card called Connect-IB that can push two full-speed FDR ports.

The Connect-IB dual-port InfiniBand FDR adapter

The Connect-IB dual-port InfiniBand FDR adapter card (click to enlarge)

This new Connect-IB card, which is sampling now, will be available for both PCI-Express 3.0 and PCI-Express 2.0 slots, and eats an x16 slot. Up until now, network adapter cards have generally been x8 slots, with half as many lanes of traffic and therefore a lot less theoretical and realized bandwidth available to let the network chat up the servers. By moving to servers that support PCI-Express 3.0 slots, you can put two FDR ports on each adapter using an x16 slot and still run them at up to 100Gb/sec aggregate across the two ports.

If your server is using older PCI-Express 2.0 slots – and at this point, that means anything that is not using an Intel Xeon E5-2400, E5-2600, E5-4600, or E3-1200 v2 processor since no other server processor maker is supporting PCI-Express 3.0 yet – then there is an x16 Connect-IB card that has one port that you can try to push all the way up to 56Gb/sec speeds.

These new cards have a single microsecond MPI ping latency and support Remote Direct Memory Access (RDMA), which is one of the core technologies that gives InfiniBand its performance edge over Ethernet and which allows for servers to reach across the network directly into each other's main memory without going through that pesky operating system stack. Mellanox says the new two-port Connect-IB card can push 130 million messages per second – four times that of its competitor. (That presumably means you, QLogic, er, Intel.)

There is also a single and dual-port option on the Connect-IB cards that slide into x8 slots. It is not clear how much data these x8 slots can really push, and until they are tested in the field, Mellanox is probably not even sure.

In theory, an x8 slot running at PCI-Express 3.0 speeds should be able to do 8GB/sec (that's bytes, not bits) of bandwidth in both directions, for a total of 16GB/sec of total bandwidth across that x8 link. This should not saturate the x8 link.

What is certain is that an x8 slot running at PCI-Express 2.0 speeds could not really handle FDR InfiniBand, with only 64Gb/sec of bandwidth (8GB/sec) each way available. That was getting too close to the ceiling.

Now, the ConnectX chips on the Mellanox adapters as well as the SwitchX ASICs at the heart of its switches swing both ways, Ethernet and InfiniBand, so don't jump to the wrong conclusion and think Mellanox doesn't love Ethernet.

The company was peddling its 40GE adapters and switches, which support RDMA over Converged Ethernet (RoCE) and which give many of the benefits of InfiniBand to customers who don't want to build mixed InfiniBand-Ethernet networks. (Or, perhaps more precisely, they want Mellanox to do it inside of the switch and inside of the adapter cards and mask the transformation from the network.) Mellanox says that it is showing up to an 80 per cent application performance boost using its 40GE end-to-end compared to 10GE networks on clusters.

In addition, Mellanox also announced that the latest FDR InfiniBand adapters will also support Nvidia's GPUDirect protocol, which is a kind of RDMA for the Tesla GPU coprocessors that allowed GPUs inside of a single machine to access each other's memory without going through the CPU and OS stack to do it.

With the current Tesla K10 and future Tesla K20 GPU coprocessors, GPUDirect will allow for coprocessors anywhere in a cluster to access the memory of any other coprocessor, fulfilling Nvidia's dream of not really needing the CPU for much at all. This GPUDirect support will be fully enabled in Mellanox FDR adapters. ®