Voltaire brings InfiniBand switch to the masses
Accelerators speed up cluster work
InfiniBand and Ethernet switch maker Voltaire this morning rolled out its Grid Director 4200, a midrange 40 Gb/sec InfiniBand switch that shoots the gap between its entry and high-end switches, and that is the product that Voltaire expects companies to buy as they adopt InfiniBand for database clustering and other HPC jobs.
According to Asef Somekh, vice president of marketing at Voltaire, the quad data rate Grid Director 4200 is aimed at companies who have more modest needs than the big supercomputer labs that the Grid Director 4700 was primarily designed for.
The Grid Director 4700 launched last June, sporting 324 InfiniBand ports running at the full-tilt-boogie of 40 Gb/sec on its 51.8 Tb/sec backplane, with the capability of doubling up to 648 ports if you need to go nuts building a huge cluster. Which, by the way, plenty of HPC labs do, which is why Voltaire put the monster switch into the field first. Now that the technology has ramped and the economy is recovering a bit, Voltaire is ready to drop a modular switch that will be more appealing to companies with more modest InfiniBand needs.
The new Grid Director 4200 is aimed at companies that need more bandwidth and ports than the Grid Director 4036E, a 1U box that debuted in January with 34 QDR InfiniBand ports as well as two ports that link into Gigabit or 10 Gigabit Ethernet networks. The Grid Director 4036E is really aimed at financial services companies - brokerages and hedge funds mostly - that are co-locating their trading systems at stock exchanges and need to support both InfiniBand and Ethernet protocols but do so in compact form factors with low energy consumption and super-low latencies.
The InfiniBand ports of the Grid Director 4036E are rated at 100 nanoseconds on a port-to-port hop, and the jump from the Ethernet to InfiniBand networks through the gateway adds about 2 microseconds. There is a Grid Director 4036 that has 36 QDR InfiniBand ports and no gateway as well.
The Grid Director 4200 is a modular switch that you add line and fabric boards into, like the 4700 machine, rather than a rack-mounted, sealed box, like the 4036 boxes. The 4200 has an 11U chassis and has room for nine line boards and four fabric boards and a backplane rated at 11.5 Tb/sec; it can host up to 162 QDR ports, and unlike the 4700, the 4200 does not allow you to double up. Port latency on this machine ranges from 100 to 300 nanoseconds, according to Voltaire.
This size machine will hit the sweet spot of the commercial (rather than the technical) HPC market, according to Somekh. The Grid Director 4200 will be available at the end of March. Pricing has not been set yet, and would likely not be divulged even if it were because that is the way of the high-end networking racket, as it is with high-end servers and storage arrays.
In addition to the new InfiniBand switches, Voltaire has tweaked its Unified Fabric Manager software, which manages the InfiniBand and Ethernet switches and bends as well as accelerating their traffic, to include a new feature called Fabric Collective Accelerator. While the FCA feature will eventually accelerate different kinds of operations commonly performed on HPC clusters, the first iteration of which will juice the speed of collective operations - those that broadcast data, gather data, or otherwise synchronize the nodes in a cluster - of the Message Passing Interface (MPI) protocol commonly used in supercomputing clusters.
Last fall, you will remember, switch and adapter maker Mellanox used a golden screwdriver upgrade on its ConnectX-2 host channel adapters that allowed them to similarly accelerate MPI collective operations. The Mellanox cards also sport a floating point math unit that can do some of the number-crunching work in HPC applications, freeing up CPUs on the server nodes in the cluster to do even more work.
Somekh says that the acceleration of MPI collectives operations properly belongs in the switch, not in the adapter cards, something that Mellanox may eventually add to its own switches. (Who says there can't be acceleration on both ends of the wire, and that it cannot be coordinated? The real question is what happens when a Voltaire switch and a Mellanox HCA are both trying to accelerate MPI operations?)
Somekh adds that what Mellanox is doing on its cards is reducing the size of MPI messages, which is helpful, but what Voltaire is doing at the switch level is cutting the number of MPI messages that are flying around between server nodes in the cluster.
According to Somekh, depending on the workload, anywhere from 50 to 80 percent of the time, a cluster is doing these MPI collective operations rather than just running calculations. That is a lot of waiting for data to either come or go.
On early benchmark tests, Voltaire has been able to reduce the waiting time on MPI collective operations by as much as a factor of ten. Importantly, the use of the FCA feature to goose MPI applications requires no changes to applications running on the cluster. The FCA feature is loosely based on the messaging accelerators that Voltaire created specifically for the financial services industry, which are also a separately sold add-on to Unified Fabric Manager.
Eventually, Voltaire will tweak the FCA feature so it can accelerate other kinds of work where similar gathering and broadcasting work is done on a big cluster. For example, Map/Reduce big math data crunching is the next workload that Voltaire will accelerate as part of the FCA add-on to Unified Fabric Manager. ®