I/O holds up the traffic in virtual systems
Clearing the bottlenecks
We can now host 512 virtual machines on a single physical server. That's a lot of virtual machines trying to squeeze a lot of I/O out of a single server's networking interfaces.
Meanwhile, vSphere 5 is out.
Customer feedback prompted some revisiting of that licensing philosophy.
VMWare made a far more interesting move on the low end. The company upgraded the free version of the hypervisor with some new features, including a change of the RAM allowance from 256GB to 32GB.
This naturally led to fierce debates on internet forums and gave tech blogs everywhere plenty to write about.
The change was bad for my customers. Determining what to do has occupied a great deal of my time and led to some interesting discoveries.
In late 2011, the virtualisation discussion has moved beyond “virtualising your servers”. You are either virtualised or you are not, in which case you are in a steadily evaporating minority that even niche players don’t really want to deal with anymore.
In a virtualised world, VMWare makes the best operating system to install on the bare metal of your x86 server. Many competitors are struggling to match features that are two major versions old. At best, they are competing against VMWare’s last product release.
Against this backdrop I must measure VMWare’s changes to ESXi. Modern servers are so fast and so powerful that only VMWare seems able to actually make use of them.
In many cases, the ability to fill up a system’s RAM with virtual machines means nothing if the underlying hypervisor can’t keep up with the demands on I/O.
I have servers in the field running 8Gbit network interface cards (NICs) providing both SAN access as well as client-facing network ports. These servers run only 128GB of RAM – an average of a mere 25 virtual machines apiece – and they absolutely flatten those NICs all day long.
I have begun upgrading the heaviest usage servers to shiny new 10Gbit NICs, only to discover that these systems are perfectly capable of flattening four 10Gbit NICs as well.
You never really appreciate how powerful a pair of Xeons is until you have asked yourself: “Exactly how much I/O do I have to feed this monster before I have bandwidth to spare?”
The processor is no longer the bottleneck in the data centre
I have rarely had moments with other hypervisors where I stood back and said: “Gigabit Ethernet is dead.” My high-demand VMWare systems, however, constantly remind me of this truth.
Except for some very rare corner cases, the processor is no longer the bottleneck in the data centre. I/O is.
In this brave new world, he who can shuffle bits around the quickest wins. Being the best at coping with this I/O bottleneck is what allows VMWare to get away with anything it wants – for the moment, at least.
Some of the systems I am running are two and even three years old. Their replacements will be able to house even more virtual machines and chow down on even more bandwidth.
My Christmas list now reads: "Dear Santa, I would like several gold bars, a pony and a crate of Intel 4up 10 Gigabit Ethernet NICs. I am going to need them."
When contemplating the conundrum of VMWare's new licensing, I find I have four choices.
I can choose to port my virtual estate to a competitor and accept that the I/O limitations of its hypervisors impose practical limits on how many virtual machines I can cram into a single host. Competitor offerings are also far less feature-rich, but even at the top end they are significantly cheaper.
I can stick with VMWare’s free hypervisor, but this will see massive server sprawl as I constantly run up against the 32GB server barrier.
I can choose to run a hybrid environment, keeping the high-I/O stuff on VMWare and migrating everything else to competitors. Or I can just pay the tithe.
Whichever path I choose, one thing is abundantly clear. Whether we are talking hypervisors, networking, storage or any other aspect of modern server design, I/O has become the primary consideration.
Great news if you make hypervisors, storage or networking gear. Less so for my wallet. ®
Vmware, Xen (and the article author) are all missing the point
We have now reached the point where it is necessary to provide proper networking features at the v-Net layer including merging correctly n x XG interfaces into m x virtual interfaces with trunking and other network protoocols working as needed on top of this.
Similarly we have reached the point where it is necessary to have the more advanced OS features like QoS, policing, reservations, etc all working too.
Neither of these are on the v-world horizon. In fact if you look at where they are going it is the completely opposite direction - transparent dumb VLAN passing to VMs using PCI virtualisation and killing all OS advanced networking features to achieve the required performance.
That already flattens out the network prior to any accel (as observed in the article). It cannot be the way forward. It is the way backward.
256 spindles is not all that abnormal.
What are you people babbling about now?! Bandwidth is almost never the bottleneck on storage and if you knew how to monitor properly you might know that.
A fibre channel disk is capable of around 200iops, less for SAS, even less still for SATA. Admitedly much much more for SSD, but are you running an SSD SAN yet? Most people aren't!
Server 2008, SQL and Exchange use 8k blocks so each io is around 8KB.
Using the 4Gb fibre mentioned in another post as being "Saturated" as an example we have approximately 400MB/s bandwidth. Divide this by 8KB blocks and ignoring the FC overhead we get 51200 IO operations possible every second over your fibre.
Divide by 200 (IOPS per disk) and you get around 256 disks which are required to saturate your link (ignoring the cache which must have filled to saturate your bandwidth). That's assuming you just stripe and didn't use any RAID in your enterprise configuration. RAID 10 halves the IO roughly, so 512 disks required there. With the penalty of RAID 5 you would need 1024 drives.
Let's assume that you have shelves with 15 drives in - a common configuration on SANs. You would need 34 shelves of disk just to saturate a single fibre connection on a SAN with RAID 10. And that's assuming you aligned your disks properly which would be a further x2 penalty.
Do you all have that many disks in your SAN environment? I'm ignoring, of course, that you don't have a single fibre connection, you have at least 2 :)