Mellanox: We're gonna make InfiniBand great again – 200Gbps great

So great, offload as much as possible from CPUs, the greatest interconnect ever

InfiniBand will go from 100Gbps to 200Gbps next year – and The Register spoke to Mellanox's marketing veep Gilad Shainer to find out what to expect.

What's coming from Mellanox is a bottom-to-top offering for the 200Gbps HDR InfiniBand spec, Shainer said, covering switches, chips, NICs and suitable cabling.

The upcoming Quantum switch device supports 40 ports of 200Gbps HDR InfiniBand or 80 ports at 100Gbps – in a modular switch, that scales to 800 ports of 200Gbps or 1,600 ports at 100Gbps. Switch latency is 90ns and aggregate capacity is 16Tbps.

The ConnectX 200Gbps adapter device has latency of 0.6 microseconds, supports PCIe gen-3 and gen-4, and includes Mellanox's multi-host support (so if you don't need 200Gbps, you can split the adapter between multiple hosts).

If you're yawning over yet-another doubling in speed, perhaps more interesting is Mellanox's continued push to make the switch an offload processor.

There's an emerging need to "analyze data everywhere, particularly when the data is being moved. With an eye to high-performance computing environments, the InfiniBand HDR devices also expand network computing and adaptive routing capabilities, both of which will be useful in environments running the previous 100Gbps generation," Shainer said.

The offload story started years ago with RDMA (remote direct memory access), which means that moving data around takes less than 1 per cent of the CPU's time, Shainer said. That's been expanded in the Quantum and ConnectX.

"The [Quantum] switch will have the ability to execute data aggregation and reduction protocols, offloading those from the CPU," he said, adding that machine learning training algorithms use the same basic concepts.

The ConnectX adapter joins in, getting in-network memory, and encryption and other security capabilities. MPI – the message passing interface in supercomputing environments – is also part of the ConnectX offload story, with collectives and matching to cut down the CPU load. Shainer said the ConnectX capabilities mean "60 per cent to 70 per cent of MPI is offloaded to the network ... one day, the entire MPI framework will migrate to the network."

Storage offloading is part of the story as well, he explained, because at the moment, checkpointing (saving the state of an application as a return point in a crash) is currently part of the CPU workload. "That's critical if you're running thousands of nodes, and you don't want to have to restart the application."

That checkpointing takes CPU time that HPC admins would rather not spend on housekeeping, so the ConnectX can enable background checkpointing.

The adapter's crypto offloading adds an interesting wrinkle to on-disk encryption. If you're using full-disk encryption, then data protection isn't related to the individual user. "But when you do it on the network, the network can have different keys for different users, or different applications," according to Shainer.

Quantum and ConnectX also add telemetry, with built-in hardware sensors providing real-time data collection.

The 200Gbps performance will be supported by HDR copper cables and splitters (for in-rack connections up to 3 metres (3.3 yards)), active silicon-photonics optical cables for in-data-center 100-metre (109-yard) links, and optical transceivers for 2,000-metre (2,187-yard) links.

The new gear is due to ship in 2017. ®

Sponsored: The Joy and Pain of Buying IT - Have Your Say


Biting the hand that feeds IT © 1998–2017