Who's going to turn base Ethernet into gold?

Original URL: https://www.theregister.com/2009/02/17/dce_progress/

Tackling drops and delays

Posted in Networks, 17th February 2009 10:28 GMT

Data Centre Ethernet (DCE) is the great white hope of convergence, the single über-network across which all other protocols will flow, simplifying network component acquisition and operating costs - but it's not that simple.

Ethernet is a fragile, unreliable base for such a role; it drops packets and message transfer time across the network is not predictable. Network links such as the Fibre Channel ones between servers and block-access storage devices will break if packets (data in frames) are lost and if message transfer takes too long.

How can the base metal that is Ethernet be transmuted into gold? The alchemists are to be found inside the IEEE standards organisation and are working on three 802.1 committees, known as the Qau, Qbb, and Qaz workgroups. David Law, a 3Com consultant engineer and chair of the 802.3 committee, explained the background.

First of all, he said, a simple point-to-point Ethernet link does not drop packets and message latency is predictable. When this pure Ethernet link is complicated by switches at either end combining other links and pumping their data packets along our original link, then it can get overwhelmed, packets can be dropped and messages take longer to traverse it. It's the switches where congestion forms and the switches will have to be involved in solving it.

Separate IEEE 802.something committees cover different aspects of the Ethernet stack. The 802.3 committee which Law chairs deals with the peer-to-peer stuff, "above the MAC". An 802.1 committee looks at switching and an 802.11 one covers Wi-Fi.

In order to have an area of an Ethernet network that is lossless (ie doesn't drop packets) and has deterministic latency (so messages don't take too long to cross the network from sender to receiver), then there will have to be bridges between the DCE domain and common, everyday Ethernet. Such data centre bridging is a project in the 802.1 world where the three committees mentioned above are located. Three committees are needed to deal with the two problems; two for packet loss and latency, and one for DCE-class device identification.

They are tasked with devising solutions to the aspect of the DCE problem they have been allocated that will command broad support in the networking industry and become a standard, permitting different suppliers' kit to interoperate in a DCE-cless network.

The three committees

The IEEE 802.1 Qau Congestion Notification Committee deals with the detection of imminent congestion. A DCE switch or Congestion Point (CP) monitors the queue of outgoing packets and samples packets. If the queue depth exceeds a set length then a congestion notification message (CNM) to a sender, the packet origination or reaction point (RP), in effect telling it to throttle back its packet transmission rate. A rate limiter in the RP reduces the frame rate by the desired amount in the CNM.

There is a problem here in that congestion detection and correction has its own latency. By the time congestion is detected packets are already in the queue. It is possible that a sudden burst of packets due to, say, a surge in server traffic, could overwhelm a switch before the congestion detection monitoring feedback loop has a chance to start working. This is where the Qbb Priority-Based Flow Control committee comes in.

It deals with priority-based flow control, the bus lanes or multi-vehicle occupancy on the motorway. A certain proportion of a link's bandwidth can be set aside for specific traffic. If there is congestion build-up its impact on important traffic can be limited in this way so that a sudden storm surge of packets is kept outside the guaranteed bandwidth section of the link the loss-less part. Thus the packet drop and latency problems are dealt with by a combination of the Qau and Qbb committees' work.

The third committee, the 802.1 Qaz Enhanced Transmission Selection project, deals with DCE-class device identification. How does an Ethernet switch which is DCE-capable know that any other switch it is in contact with is a DCE-capable one too? The existing Ethernet priority scheme, IEEE Standard 802.1Q, is inadequate, because no minimum bandwidth is provided for any traffic class. The Qaz project uses a DCB (Data Centre Bridge) Capability Exchange Protocol (DCBX) to accomplish this.

A Qaz project slide says: "Using priority-based processing and bandwidth allocations, different traffic classes within different traffic types such as LAN, SAN, IPC, and management can be configured to provide bandwidth allocation, low-latency, or best effort transmit characteristics."

This is based, Law says, on the IEEE STD 802.1AB Link Layer Discovery Protocol (LLDP).

If two Ethernet devices declare they are DCE-capable then the traffic between them can be lossless and of predictable latency, using the Qau and Qbb control mechanisms. They form a cluster of DCE-class Ethernet devices, a DCE cloud within the general Ethernet.

Setting standards in stone

The three committees or projects - Qau, Qbb and Qaz - have to individually agree amongst their members that their work is complete, and a 75 per cent majority in a workgroup ballot is needed at a minimum. Once that hurdle is passed then there is a general IEEE or sponsor ballot, equivalent to a public review. If that is in favour then the project becomes a standard. Thus for DCE to become a standard, the Qau, Qbb and Qaz workgroups have to pass their workgroup ballots and then the IEEE public ballot has to be passed.

This final set of ballots could be completed in early 2010, say in Q1 or the Q1/Q2 transition area. Commercial product, fully DCE-compliant products, could appear in Q2/Q3 2010. So-called intercept DCE products could appear in Q1 2010 as suppliers build the Qau, Qbb and Qaz workgroup functionality into their products before the public ballot, intercepting the developing standard, and potentially commit to upgrading firmware or whatever else is necessary to meet any final changes on route to formal standardisation.

Once DCE is a standard then Fibre Channel over Ethernet (FCoE) has the platform it needs to become a reality and Fibre Channel SAN access can begin a migration away from physical Fibre Channel towards software Fibre Channel sending messages over the DCE cloud. It ought to be feasible to begin pilot development FCoE work in Q1 2010, if not before, depending upon the status of vendors' DCE intercept products.

You could set up trial DCE cloud with CNAs (Converged Network Adapters) layering FCoE messages onto DCE and sending them via DCE-class switches to storage arrays, and checking that the Fibre Channel link between servers and storage arrays works and remains intact as you throw a blizzard of congestion-inducing packets in sudden storms of traffic at the DCE-class switches.

Either they will maintain their virtual FC SAN traffic scheme intact or they will break and you will, no doubt, be able to tune parameters to manage the break downs. Or you won't, in which case it's back to the drawing board.

Early 2010 is going to be DCE proof of concept time. If it works then the beginning of the end of the physical Fibre Channel era will have arrived. That will be momentous. ®