Feeds

Voltaire brings InfiniBand switch to the masses

Accelerators speed up cluster work

Next gen security for virtualised datacentres

InfiniBand and Ethernet switch maker Voltaire this morning rolled out its Grid Director 4200, a midrange 40 Gb/sec InfiniBand switch that shoots the gap between its entry and high-end switches, and that is the product that Voltaire expects companies to buy as they adopt InfiniBand for database clustering and other HPC jobs.

According to Asef Somekh, vice president of marketing at Voltaire, the quad data rate Grid Director 4200 is aimed at companies who have more modest needs than the big supercomputer labs that the Grid Director 4700 was primarily designed for.

The Grid Director 4700 launched last June, sporting 324 InfiniBand ports running at the full-tilt-boogie of 40 Gb/sec on its 51.8 Tb/sec backplane, with the capability of doubling up to 648 ports if you need to go nuts building a huge cluster. Which, by the way, plenty of HPC labs do, which is why Voltaire put the monster switch into the field first. Now that the technology has ramped and the economy is recovering a bit, Voltaire is ready to drop a modular switch that will be more appealing to companies with more modest InfiniBand needs.

The new Grid Director 4200 is aimed at companies that need more bandwidth and ports than the Grid Director 4036E, a 1U box that debuted in January with 34 QDR InfiniBand ports as well as two ports that link into Gigabit or 10 Gigabit Ethernet networks. The Grid Director 4036E is really aimed at financial services companies - brokerages and hedge funds mostly - that are co-locating their trading systems at stock exchanges and need to support both InfiniBand and Ethernet protocols but do so in compact form factors with low energy consumption and super-low latencies.

The InfiniBand ports of the Grid Director 4036E are rated at 100 nanoseconds on a port-to-port hop, and the jump from the Ethernet to InfiniBand networks through the gateway adds about 2 microseconds. There is a Grid Director 4036 that has 36 QDR InfiniBand ports and no gateway as well.

The Grid Director 4200 is a modular switch that you add line and fabric boards into, like the 4700 machine, rather than a rack-mounted, sealed box, like the 4036 boxes. The 4200 has an 11U chassis and has room for nine line boards and four fabric boards and a backplane rated at 11.5 Tb/sec; it can host up to 162 QDR ports, and unlike the 4700, the 4200 does not allow you to double up. Port latency on this machine ranges from 100 to 300 nanoseconds, according to Voltaire.

This size machine will hit the sweet spot of the commercial (rather than the technical) HPC market, according to Somekh. The Grid Director 4200 will be available at the end of March. Pricing has not been set yet, and would likely not be divulged even if it were because that is the way of the high-end networking racket, as it is with high-end servers and storage arrays.

In addition to the new InfiniBand switches, Voltaire has tweaked its Unified Fabric Manager software, which manages the InfiniBand and Ethernet switches and bends as well as accelerating their traffic, to include a new feature called Fabric Collective Accelerator. While the FCA feature will eventually accelerate different kinds of operations commonly performed on HPC clusters, the first iteration of which will juice the speed of collective operations - those that broadcast data, gather data, or otherwise synchronize the nodes in a cluster - of the Message Passing Interface (MPI) protocol commonly used in supercomputing clusters.

Last fall, you will remember, switch and adapter maker Mellanox used a golden screwdriver upgrade on its ConnectX-2 host channel adapters that allowed them to similarly accelerate MPI collective operations. The Mellanox cards also sport a floating point math unit that can do some of the number-crunching work in HPC applications, freeing up CPUs on the server nodes in the cluster to do even more work.

Somekh says that the acceleration of MPI collectives operations properly belongs in the switch, not in the adapter cards, something that Mellanox may eventually add to its own switches. (Who says there can't be acceleration on both ends of the wire, and that it cannot be coordinated? The real question is what happens when a Voltaire switch and a Mellanox HCA are both trying to accelerate MPI operations?)

Somekh adds that what Mellanox is doing on its cards is reducing the size of MPI messages, which is helpful, but what Voltaire is doing at the switch level is cutting the number of MPI messages that are flying around between server nodes in the cluster.

According to Somekh, depending on the workload, anywhere from 50 to 80 percent of the time, a cluster is doing these MPI collective operations rather than just running calculations. That is a lot of waiting for data to either come or go.

On early benchmark tests, Voltaire has been able to reduce the waiting time on MPI collective operations by as much as a factor of ten. Importantly, the use of the FCA feature to goose MPI applications requires no changes to applications running on the cluster. The FCA feature is loosely based on the messaging accelerators that Voltaire created specifically for the financial services industry, which are also a separately sold add-on to Unified Fabric Manager.

Eventually, Voltaire will tweak the FCA feature so it can accelerate other kinds of work where similar gathering and broadcasting work is done on a big cluster. For example, Map/Reduce big math data crunching is the next workload that Voltaire will accelerate as part of the FCA add-on to Unified Fabric Manager. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
BYOD's dark side: Data protection
An endpoint data protection solution that adds value to the user and the organization so it can protect itself from data loss as well as leverage corporate data.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?