Mellanox cranks up InfiniBand switches
648 ports and 40 Gb/sec are the magic numbers
With all of the talk about 10 Gigabit Ethernet in the commercial server racket these days, it is easy to forget that the king of high bandwidth in server networking is still InfiniBand, which runs at 40 Gb/sec in the new round of products that are coming out. Mellanox Technologies, which makes chips for processing the InfiniBand protocol, is the latest vendor to kick out its own quad-data rate (QDR) switches.
Like Sun Microsystems and Voltaire, which have their own QDR InfiniBand switches, the IS5000 family of modular switches tops out at 648 of the QDR ports. Sun previewed its updated InfiniBand switch last November  at the Supercomputing 2008 show, and is expected to start talking it up at International Supercomputing Conference '09  in Hamburg, Germany, this week. Voltaire launched a 648-port QDR director switch back in December and has begun shipments , even as it expands  into the 10 Gigabit Ethernet market with a 288-port switch announced two weeks ago.
The Mellanox IS5000 modular switch announced today supports either 20 Gb/sec (DDR) or 40 Gb/sec (QDR) InfiniBand links and packs up to 648 ports in a 27U chassis. The switch is modular, which means there are different configurations available to support varying levels of port count and bandwidth. All of them sport Mellanox' own InfiniBand chips, which support up to 18 ports in a non-blocking configuration. The modular switches put this chip on a single blade with 18 ports and these blades slide into the chassis, which holds 36 of the blades - known as leaf blades - in the top-end configuration.
The IS5100 has 108 ports and three spine modules, offering up to 8.64 Tb/sec of bandwidth. The IS5200 doubles that up to 216 ports and 17.28 TB/sec of bandwidth and the IS5600 takes it up by another factor of three to reach 648 ports and 51.8 Tb/sec of bandwidth. Daisy chaining a bunch of these IS5600 switches together can support thousands or tens of thousands of server nodes in a parallel cluster, says John Monson, vice president of marketing at Mellanox.
Today, prior to the opening of the ISC '09 event tomorrow, Mellanox is also rolling out two new QDR fixed switches based on the same electronics as the IS5000 family. The IS5030 is a 36-port switch with 2.88 Tb/sec of bandwidth that has one management port and chassis-level management included. The IS5035 adds a second management port and support for the FabricIT fabric-level management software that is included with the IS5000 director InfiniBand QDR switches.
Mellanox has already launched a fix-port edge switch with QDR InfiniBand support (the MTS3600, which has 36 ports) and a fixed-port director switch (the MTS3610, which has 324 ports). The company also sells its BX4020 and BX4010 gateway switches, which can link QDR InfiniBand networks seamlessly (and statelessly without much overhead, according to Monson) to 10 Gigabit Ethernet networks or to Fibre Channel storage networks that run at between 2 Gb/sec and 8 Gb/sec speeds. Dell, Hewlett-Packard, and Fujitsu are also reselling 36-port DQR InfiniBand switches that are based on Mellanox chips.
Mellanox, like other switch vendors, is not big on providing pricing for its switches, not the least of which because there is a lot of discounting that goes on in the switching space, but when I suggested that it might cost somewhere around $500 per port to get one of these IS5000 series modular switches, Monson didn't say I was wrong.
He added that fixed-port switches with relatively few ports tend to cost a little less, and that bigger switches with lots more ports tend to cost more per port. Presumably you pay a premium for a modular switch of a given capacity compared to a fixed switch of equal capacity, because of the extra work and electronics involved in making it extensible.
Monson says that the gateways mentioned allow Mellanox to participate in network convergence without being a zealot about one protocol or the other. "Our vision is simple: It shouldn't really matter what protocol you want to run," explains Monson, referring to its Virtual Protocol Interconnect and the adapters that support the convergence of InfiniBand, 10GE, and Fibre Channel over either one. By contrast, Cisco Systems' vision of the commercial data centre, as embodied in its "California" Unified Computing System, is based on a converged 10 Gigabit Ethernet backbone that supports Fibre Channel over Ethernet.
While generalizing a bit, Monson says that the increase in speed and bandwidth of the InfiniBand protocol has not only allowed parallel supercomputer clusters to scale out, it has also allowed them to run more efficiently at the processor level. Monson put together some numbers from the past couple of Top 500 supercomputer rankings, based on the Linpack Fortran benchmark tests that are used to rank the machines, which suggested that the typical cluster using Gigabit Ethernet was able to run at about 52 per cent efficiency. (That percentage is the result of dividing the maximum performance of a machine, in gigaflops, on the Linpack test by its peak theoretical floating point performance.)
For machines linked by 10 Gigabit Ethernet switches, the efficiency rose to 61 percent. But machines lashed together using 20 Gb/sec InfiniBand are hitting 74 per cent efficiency, and Mellanox will demonstrate this week on some new machines added to the June 2009 Top 500 list that comes out on Tuesday that it can drive 92 per cent efficiency with 40 Gb/sec InfiniBand. The choice of protocol and bandwidth certainly does make a difference at the system level (as you would expect).
It would seem that you can get your flops by spending money on servers or on networking, but either way, you are going to spend more money than you thought you had to.
Speed and low-latency are not the only features that Mellanox will be touting in its IS5000 family of 40 Gb/sec switches this week. Prior InfiniBand switches from Mellanox supported fat tree configurations, but the new switches can do 2D mesh, 3D torus, and hybrid schemes, giving HPC customers more options for how the server nodes in their clusters are linked together (and thereby affecting how applications perform). The new switches also had adaptive routing, port mirroring (important for security), end-to-end congestion management, and subnet partitioning.
The FabricIT management software that Mellanox has cooked up will be another thing it tries to use as a lever. It has hardware configuration and management features at the host, chassis, switch, and fabric levels, as well as automated performance tuning and power management features.
Mellanox has certainly been affected by the economic meltdown, but has done better than many other IT suppliers, even if profits are under pressure. In 2008, the company's revenues rose by 28.1 per cent to $107.7m and it brought $22.4m of that to the bottom line, down 37.1 per cent. In the first quarter, Mellanox felt the pinch more, with sales of $22.6m (down 10.3 percent) and net income of $2.1m (down 54.4 per cent.) The company has given guidance for its second quarter sales to be in the range of $24m to $24.5m, which is a decline of 12 to 15 per cent. But after having gone public in February 2007, Mellanox is still holding on to $186.9m in cash and has a market capitalization of $368m.
While this is about half of its value in the wake of its IPO, the company's shares are clawing their way back upwards after falling to a quarter of its IPO valuation at the height of the economic meltdown last fall. The next $12 in share price is going to be a lot harder to achieve than the prior bump of $6 since the beginning of the year. It is also going to take a long time, unless someone decides that they want to be in the switch business instead of partnering, and artificially raises the valuation of Mellanox.
An acquisition of Mellanox seems unlikely, but not outside the realm of possibility for a storage vendor looking for some more play in servers. Server vendors will play it cool, letting QLogic, Mellanox, Voltaire, Blade Network and others be Switzerland, while Cisco tries to do the whole server and networking stack by its lonesome. ®