How to network at a supercomputing show
Voltaire's Vantage 8500 switches will get the Collective Accelerator feature, too, alongside the InfiniBand director switches (both 20 Gb/sec and 40 Gb/sec versions) when UFM 3.0 is released in the first quarter of 2010. The current release of UFM is 2.2, and it does not have this MPI offloading capability.
The performance enhancement that HPC customers will see from UFM will vary by workload, but Somekh says the tuning that comes from the UFM range in early customer trials is "from dozens to hundreds of percent improvement" in network performance. As for the Collective Accelerator module, Somekh says that the offloading to the switches can reduce MPI collective operations by 90 per cent, cutting total MPI runtime by as much as 40 per cent.
Another new feature of UFM is called Adaptive Suite, which is a bundling of Adaptive Computing's Moab cluster management tool. This can orchestrate the provisioning of cluster resources through individual server, storage, network, operating system, and application provisioning tools. You could think of Moab as air traffic control, and the other tools as pilots that listen to ATC. The integrated UFM-Moab product will also come with UFM 3.0, too.
Fujitsu does switches, too
Server maker Fujitsu has a division called Frontech that makes ATMs, point of sale terminals, other display devices and, believe it or not, network switches. So Fujitsu was also on hand at SC09, not just to talk about its future eight-core Sparc64-VIIIfx processors and the supers that will use it, but also layer 2 switches.
Specifically, Fujitsu announced the XG2600, a 10 GE layer 2 switch that puts 26 ports into a 1U chassis. It uses SFP+ optical modules and can use SFP+ twinax copper cables. The unit's spec sheet says it can deliver up to 520 Gb/sec of aggregate bandwidth with switching latency as low as 300 nanoseconds. Fujitsu is also claiming it can deliver this kind of performance at under 5 watts of power consumption per port.
Those are pretty good numbers when you consider that Arista Networks was on the show floor bragging that in actual benchmark tests, its 7148SX 48-port 10 GE SFP+ layer 2/3 switch was able to demonstrate "extraordinarily low latency" of 600 nanoseconds. (You can see the benchmark tests validating this performance here.) When you read the report, you see that 48 ports is a lot to cram into a 1U form factor, and that average latency is more like 1,273 nanoseconds.
Arista, like many switch makers, is using silicon from Fulcrum Microsystems. Fulcrum were also at the show peddling a whitebox switch along with partner Teranetics that uses its FocalPoint FM4224 10 GE switch chip - the same one used by Arista. This has been paired with Teranetics' dual-port, triple-rate TN20225 10GBase-T physical device to make a 1U switch that has 20 10GBaseT ports and four SFP+ ports.
This whitebox - code-named "Monte Carlo" - is available on an OEM basis for $900 per port. So, if you want to try to take on Andy Bechtolsheim, one of the founders of Sun Microsystems and the brains behind Arista Networks, here's your chance.
InfiniBand, 10 GE, and Gigabit Ethernet in HPC
One last interesting bit of networking news coming out of SC09 last week: the distribution of interconnects among the Top 500 supers ranking. Obviously, the fastest 500 machines are not indicative of the current state of cluster interconnects, but a kind of leading indicator to what will be normal sometime down the road and what is fading from the market.
For all the talk about 10 GE switches, there is only one machine using that technology on the current Top 500 list. There are, by contrast, 13 machines using QDR InfiniBand, another 31 using DDR InfiniBand, and another 137 using regular old, 10 Gb/sec InfiniBand. Mellanox says it has a 37 per cent share of the InfiniBand switches (by machine count, not ports) of the Top 500 list, and Voltaire says that it has just north of 50 per cent IB share on the list.
But there are plenty of cheapskates, even in the upper echelon of supercomputing. Another 258 machines are based on - we can say it - unimpressive Gigabit Ethernet switching between supercomputing nodes. Just remember, it isn't how big your switch is, but what you do with it that counts. The New York Stock Exchange has Gigabit Ethernet guts.
There are another 15 machines using Cray's "SeaStar" XT family interconnect, three using Quadrics interconnects (but Quadrics is dead, so that will change soon enough), seven using Myrinet interconnect, three using SGI's NUMAlink, and the rest using a variety of federation, fat tree network, 3D torus or proprietary interconnects. ®