Mixing network traffic types on Ethernet
A recipe for success - or disaster?
Expert Clinic Is Fibre Channel over Ethernet (FCoE) a panacea for the difficulties inherent in running separate storage and general networks?
FCoE is not taking off and, indeed, there have been articles published that suggest it may die before it arrives. Let's look again at the idea of mixing general networking traffic and storage networking traffic. Is it, or can it be, a recipe for success?
Are we trying to mix different kinds of road traffic or converging road and rail when combining general and storage traffic on one network?
Tony Lock of Freeform Dynamics thinks this mixing can be done but needs a lot of careful preparation and monitoring. It won't work "out of the box."
Tony Lock - Programme Director, Freeform Dynamics
For much of the past decade the “networking” that connects users to their applications and services has often been taken for granted. But if general networking has been less than widely appreciated, the networks that tie servers to storage have been almost invisible to everyone bar the all-too-few skilled storage administrators who delve into to the art. Many organisations are contemplating significant changes to their IT architectures and IT vendors are promoting a raft of new technologies in the storage arena. This brings up the issue of whether it is it feasible to bring specialised storage networks and general Ethernet data networks together?
General networking is now very firmly based around the TCP/IP protocol and Ethernet while storage networking is still grappling with many protocols and network technologies. Among these, perhaps the most firmly entrenched is Fibre Channel, a lossless, deterministic protocol designed to ensure that any data sent to the storage disks gets there with minimum latency and very little chance of data corruption. These characteristics were not originally enshrined in the standard Ethernet protocols employed for general networking. While work to incorporate these has continued apace, support is still far from universal and the need for new approaches, equipment and tools has a resulting uptick in the equipment cost.
The networks that tie servers to storage have been almost invisible to everyone bar the all-too-few skilled storage administrators who delve into to the art.
So is it possible to get storage traffic and general network traffic to cost-effectively share a common cabling system, namely Ethernet? Technically the answer is yes, as protocols such as FCoE (Fibre Channel over Ethernet) and iSCSI have now matured sufficiently for mainstream adoption. It is apparent that organisations are beginning to contemplate converging their networking stacks as well as management – and with it the cabling infrastructures they employ.
The cost savings of cabling only one storage and data network are attractive, as is the flexibility potentially available for dynamic reconfiguration. But there are significant challenges between contemplation of such a change and things happening on the ground. For one, network cable infrastructures have very long lifetimes and replacing them is by no means simple to achieve.
Impactful initiatives for the data centre.
Despite this, as the chart indicates, organisations are becoming aware of the impact that converged networks could have, even if their implementations have yet to ramp up. That said, there are questions concerning the feasibility of running storage and application traffic on the same physical networks without either users or applications, being affected by service degradation. Can you do it?
The answer is ‘yes’, but expecting it to just work out-of-the-box would be asking a bit much. Managing the complexity of convergence will require sophisticated traffic monitoring and management tools to ensure that service quality is maintained at desired levels. An understanding of the base line for the current usage and service quality combined with projections of future growth requirements of each of the services to be delivered over the network is essential – yet few organisations undertake such management processes routinely today.
To use a network to support all forms of traffic will require new processes to be put in place as well as the use of new tools. The lack of much “established best practice” – to date coupled with significant costs and operational challenges – make it highly likely that the adoption of combined networks will take place over years, not months.
Tony Lock is a As Programme Director at Freeform Dynamics, responsible for driving coverage in the areas of Systems Infrastructure and Management, IT Service Management, Outsourcing, and emerging hosting models such as Software as a Service and Cloud Computing.
Greg Ferro, our next expert, looks at the 'how' of the issue.
Greg Ferro - Network Architect and Senior Engineer/Designer.
A key element of the decision to move to a Network Fabric is understanding that a Storage Fabric performs like a SCSI cable, or a channel: it can be emulated and adapted for a dynamic Ethernet network. This is true for both FCoE and iSCSI. FCoE uses the encapsulation of Fibre Channel into an Ethernet frame as a simple wrapper. No modification of the FC frame is performed, except to ensure that the FC MTU is 1440 bytes. Both FibreChannel and iSCSI are simple mechanisms for transporting SCSI commands from the host to the disk drive, where the disk drive is emulated as a LUN.
In data networks, emulating a “channel” is more difficult. Ethernet is designed as lossy protocol and, because of this, applications that use Ethernet are designed to tolerate delay or even loss. For example, the TCP/IP protocol will recover any lost packets, and re-order packets to recover from this network loss. An Ethernet switch is able to buffer data to some extent, since Ethernet is a lossy protocol and must decide which frame is to be transmitted in the event that a temporary overload occurs. Thus a short burst of network traffic of 100 milliseconds can be buffered for another 50 ms before being forwarded.
Data traffic is able to use available bandwidth that is not used by storage because storage data is bursty and relatively small in volume.
Data Centre Switches are a new category of Ethernet switches that provide much greater bandwidth, performance, low latency and improved resilience.
When an Ethernet frame is received on a Switch, it is placed into a memory location and waits for forwarding. For storage traffic this isn’t acceptable since the end-to-end delay should be as low as possible to ensure that storage throughput is fast. The Ethernet Switch is configured to immediately service the buffer that contains storage data as well as to forward the frame at the next interval.
In the event that a buffer overflows, Fibre Channel protocol requires any switch or host in the path to actively signal back to the source that an overflow has occurred – so that the source can cease sending until a signal is received stating that sending can resume.
Enhanced Transmission Selection (ETS IEEE802.1Qaz) provides a way to identify and group traffic so that Priority-based Flow Control (PFC IEEE802.1Qbb) can provide link-signalling for each hop to pause the sender group if congestion occurs. ETS is also the basis for a dynamic provisioning feature for a switch, to signal configuration data between switches to ensure that the configuration is correct between devices.
The combination of PFC and ETS mean that storage data can be selected and handled in the same way that a (Fibre Channel) SAN performs. Data traffic is able to use available bandwidth that is not used by storage because storage data is bursty and relatively small in volume. And a DCB (Data Centre Bridging) network works equally well for iSCSI, NFS and FCoE – giving Storage Administrators more choices.
Greg Ferro describes himself as Human Infrastructure for Cisco and Data Networking. He works freelance and has spent time at financial institutions, service providers, working for resellers and dot coms, in both the largish and smallish companies
Combining the two types of traffic – general LAN stuff with storage traffic – is technically feasible. This is only if, and it's a big if, you use the right kind of Ethernet gear and standards – such as DCB, PFC and ETS. It can be feasible if you carefully plan it, provision your upgraded Ethernet infrastructure appropriately and monitor it closely. There are no "Best Practices" set down, and Lock suggests that your existing network management processes will need extending and modifying to work well.
It is obvious FCoE is no plug 'n play, drop-in replacement or upgrade for ordinary Ethernet. So far there is little indication that converged storage and general networking are taking off.
The biggest reason for this, of course, is the cost of upgrading Ethernet combined with the investigative and implementation difficulties of managing the switch to converged networks. FCoE is no panacea. Whether it is going to be a generally effective route to convergence, we just cannot say. ®
A couple of minor points
I have a couple of minor points:
1. The statement, "...FC MTU is 1440 bytes..." should be 2240.
2. The statement "When an Ethernet frame is received on a Switch, it is placed into a memory location and waits for forwarding. For storage traffic this isn’t acceptable since the end-to-end delay should be as low as possible to ensure that storage throughput is fast. The Ethernet Switch is configured to immediately service the buffer that contains storage data as well as to forward the frame at the next interval." >> I’m not sure what is meant by this, FCoE Frames are routinely stored in a buffer and will be forwarded when bandwidth is available. As Greg mentions later, ETS (802.1Qaz) ensures that each priority is allocated a certain amount of bandwidth and in cases where the bandwidth is being fully utilized, or there is upstream congestion, FCoE frames will be stored in a buffer (sometimes for extended periods of time)..
No good will come of this
Thus the reason why I detest the idea of intermixing "disk" traffic across the normal data LAN. I've already had to endure countless shouting matches over who gets what priority QoS wise, endless "marking" mashups and resisted the urge to throttle someone in a never ending meeting because of the insistence on 'converging' all of our services upon one poor wire. (or fiber).
Separate LANs especially within a data center were created for a reason. Piling them all onto the same switching network even with independent VLANs is just asking for trouble as far as I'm concerned.
Now, where's my stone tablet and chisel? I've got an email to get out.
Even though we have iSCSI and FCoE today from some manufacturers, look at almost any best practice guide and at least with iSCSI and NFS they'll always reccomend running jumbo frames whenever possible for the storage. Jumbo frames I think in general is still too risky to run as your "main" ethernet ports because of compatibility issues.
So the point is even with ethernet, if you want best performance you need seperate NICs on each server and seperate cables to run jumbo frames. You can use the same ethernet switches since switches can run jumbo/non jumbo on the same ports, with server NICs I have not yet come across one at least on linux and vmware where you can have a physical port run both jumbo and non jumbo frames (non jumbo frames as in ENTIRELY non jumbo not jumbo with the ability to re-transmit the fragments if they are too big).
Jumbo frames is certainly not a requirement, you can (and many do) just fine without it. But if your running a serious operation with storage on ethernet I think most people will opt to use jumbo whenever possible.
I wonder if we'll ever get to "super jumbo" frame sizes, I mean jumbo frames became popular back when gigE was starting out, now with 10Gig commonplace now, and 40gig coming, I think it would make sense to give the option to boost the frame size even higher.