Force10 adds rack-topping Gigabit switch
Fat buffers for virt servers
Force10 Networks has been flattening out and speeding up networks since its was founded over a decade ago, and today it is fleshing out its product line with a server virtualization–friendly top-of-rack Gigabit Ethernet switch called the S60.
There are plenty of Gigabit Ethernet switches out there, but Steve Garrison, vice president of marketing at Force10, says his company has studied the issues involving switches that cope with virtualized server network traffic and figured that there was room for one more with some different features.
While 10 Gigabit Ethernet switches are getting traction, in the top-of-rack (ToR) area — which Garrison says is the hot spot in the data center these days as companies begin to think at the rack level, not the individual server level — it is plain old Gigabit Ethernet that has the lion's share of the market, with over 90 per cent of ToR switch ports shipped in 2009, according to stats from networking-market watcher Dell'Oro Group.
Companies are deploying Gigabit Ethernet switches at the tops of their server racks, but the problem is that the increasing utilization of servers creates "rogue waves" on the network ports that can quickly swamp Gigabit Ethernet switches. The more VMs you add to a server, which is made possible by the addition of more CPU cores, memory, and I/O bandwidth in the iron, the better chance you have of creating a rogue wave of server traffic that can overwhelm the switch.
A port coming out of the server and going into the switch might burn only 10Mb/sec of bandwidth supporting two virtual machines, but when you push it up to four to ten VMs, you're talking somewhere around 200Mb/sec with spikes that get a bit higher than that, and once you are up to 20 VMs you're pushing Gigabit speeds with spikes that can be significantly higher due to the "bursty" nature of multimedia and storage applications running in the data center these days.
The fix for congestion is not necessarily to move to 10 Gigabit switches, according to Force10, but rather to push line rates in a Gigabit switch and give the device ultra-deep buffering to cope with those momentary rogue waves on the network.
That's what the S60 switch does. The switch comes in a 1U chassis with 44 ports (which support 10Mbit, 100Mbit, and Gigabit speeds) and four SPF+ Gigabit ports, as well as offering two high-speed expansion slots for adding up to four 10 Gigabit uplinks or stacking interfaces. Up to twelve S60 switches can be stacked, lashed together, and managed as a single switch.
The S60 has front-to-back airflow and redundant power supplies (AC or DC), which data centers want these days, and only burns 156 watts, which is a lot less than competing ToR switches — many of which are 10 Gigabit products, admittedly.
The S60 also has 1.25GB of deep packet buffering, which compared favorably to the 768MB in Arista Network's 7048 switch and the 16MB in Cisco Systems' 4948-10GE switch. The Force10 Operating System (FTOS) inside the switch can partition up the packet buffers on the fly and allocate more or less buffer to specific ports as traffic conditions require.
The S60 delivers 120 million packets per second of forwarding capacity and has a switch fabric capacity of 136Gb/sec, too, which is on par with other switches topping off racks.
The S60 top-of-rack switch is being tested at 17 hyperscale customers now and will begin shipping on June 30. The base unit (not including uplinks and stacking modules) costs $10,595. ®
1.25GB sounds like marketing rather than engineering
From my recollection, typical TCP sessions will retransmit after 2 seconds, so providing more buffering than that is futile. If a server is currently under heavy load, there will be a point where the additional buffering is just maintaining the overloaded state and preventing load-shedding (i.e. VMotion) until the box starts dropping the connections anyway.
My guess is that for a fully loaded switch (44 x 1Gbps, 4 x 10Gbps) buffering in excess of maybe 0.25 second of traffic (i.e. a bit over 256MB of buffering for the whole switch) will achieve very little additional performance - it would be interesting to see how this switch performs in load testing as compared to the other switches mentioned in the article.
First of all, the story is a bit misleading. A machine with a GigE interface will always send at GigE speed. If you send 10 bytes, it will be sent at a rate of 1 Gig/Sec which just means the transmission will be short. Now, people looking at bandwidth graphs might see only 10 meg of usage. That means that it is transmitting at 1Gig only 1 percent of the time over the past sample period.
The problem comes in where you have several machines talking at once. So imagine GigE switch ports with a GigE uplink and 10 servers attempt to send at exactly the same time, even if the data they are sending at that moment is short in length, the switch must store that burst of data coming in. If you have 10 soda straws in and one soda straw out, you need a reservoir to hold some of that traffic until it can drain out. You often see this with uplinks and backend database machines where you might have 100 front-end servers trying to talk at once to a backend database. This problem gets worse when you have 10 virtual machines on one physical machine.
So one might say ... put a 10G interface on the database machine. That is fine for one direction but now what happens when the db machine sends? It is sending at 10Gig and the data can be delivered to the receiving interface at only 1Gig. Again, you need to buffer it if you don't want to do things like tcp backoff. If you start dropping packets, backing off TCP, and reducing window sizes, you are killing througput that could be avoided by simply adding some RAM to the switches for buffers.
Yes, buffers can be bad for high latency links but for the local LAN, they are generally a good thing.
The benefit of the deep buffers is in handling momentary spikes, which are often observed on individual machines.
When such spike occurs, it is buffered, while the egress line continues handling the data at its maximum speed, provided the scheduler can keep up. With deep buffers, even very large spikes can be handled without affecting the clients and without requiring retransmission of packets dropped due to congestion.
The deep buffer cannot do anything for your hypothetical 1Gbps traffic through 100Mbps line bottleneck if the source keeps sending data at a uniform 1Gbps. If this is the case, you simply need to buy more bandwidth.
The solution presented by Force10 aims at handling traffic that fits within the available bandwidth most of the time, while allowing for occasional (although fairly large) traffic spikes that don't. As the buffers help avoid congestion, packets do not need to be retransmitted (lengthy process), they simply sit in the queue looking pretty and await their turn at the line, while the traffic simply suffers from a bit of delay.