Original URL: https://www.theregister.com/2011/11/17/server_virtualisation/

Release the brakes on your virtual servers

Step up the pace

By Liam Proven

Posted in On-Prem, 17th November 2011 11:11 GMT

One of the dirty little secrets of virtualisation is the performance cost: operating systems running inside a virtual machine are slower than those running natively on the same hardware, sometimes by quite some margin.

This is termed virtualisation overhead, and with current whole-system virtualisation, it's a given. It always happens. The question is, how much.

Back in the days of ESX Server 3, VMware itself admitted that integer performance suffered an overhead of up to six percent and more complex CPU operations up to 18 per cent. It claimed that Xen 3 was about twice as bad.

These days, things are not so serious, and the performance differential between the main hypervisor vendors has mostly evened out.

Stiff competition

Even so, bear in mind that this was on a single host with a single virtual machine. Contention for shared resources is also a significant issue, especially when it comes to disk storage where virtual machines often share a single drive or array.

Most current virtualisation on x86 is whole-system virtualisation: each virtual machine is a complete emulated PC containing a complete PC operating system. The virtual machine's "disks" are actually files in a file system managed by a different operating system.

All the componentry of that nice uniform hardware platform that means you can move virtual machines from host to host – network cards, motherboard chipset, graphics adaptor and so on – is not nice fast hardware, it is software emulations running as part of the hypervisor.

Hardware extensions

Gradually, x86 chips are acquiring hardware extensions to assist in this emulation. The first generation of hardware virtualisation assist extensions was Intel's VT, introduced in some of the last models of Pentium 4, the 662 and 672, in 2005. AMD-V followed with the Socket AM2 Athlons in 2006.

This hardware virtualisation merely allowed hypervisors to create a "Ring minus-one" – essentially, trapping Ring 0 (kernel-mode) code and running it though a software emulator. The CPU-intensive process of mapping virtual machines' memory to the host's physical memory still had to be done in software.

This changed in 2007 with the arrival of AMD's second wave of hardware virtualisation, Rapid Virtualization Indexing (RVI), in the Barcelona generation of Athlons and Opterons. RVI provides the hypervisor with shadow page tables in hardware.

Page tables hold the map that translates addresses in an operating system’s memory layout to physical RAM addresses. But from inside a virtual machine, these emulated physical addresses are actually blocks of the host's RAM. RVI's second level of indirection accelerates the translation of memory addresses inside virtual machines to real physical memory addresses.

This makes little difference to a virtual machine’s pure CPU performance, but significantly enhances memory-intensive workloads, to the tune of 42 to 48 per cent.

Intel's equivalent is called Extended Page Table and appeared with the Nehalem-family Core i3, i5 and i7 processors in late 2008.

For the moment, this is all hardware can do to help. The remaining techniques are a matter of software and system configuration.

In praise of paravirtualisation

Emulation is expensive so another good way to boost performance is to avoid it. From the virtual machine perspective, one way to do this is to modify the guest operating system or its drivers to be aware that they are running in a hypervisor.

For example, a guest operating system can be provided with special drivers that talk directly to the virtual network connecting the virtual machines to the host machine, rather than emulating a physical network card.

Similar methods can be applied to storage (such as SCSI and iSCSI), graphics, input devices and even memory management.

Microsoft calls this Enlightened I/O for its Hyper-V Server; support is built into Vista SP1, Windows Server 2008 and later, and drivers are available for Windows Server 2003, SUSE Linux Enterprise Server 10 SP3 and Red Hat Enterprise Linux 5.2 to 5.5.

VMware had a similar approach, the Virtual Machine Interface, which allowed Linux guests to communicate with the hypervisor, but this has now been outpaced by hardware virtualisation.

VMware also offers the vmxnet virtual NIC, as well as enhanced vmxnet2 and vmxnet3 drivers that offer TCP Offload Engine acceleration to virtual machines on suitably equipped hosts.

For Xen, there are PV drivers and for KVM, Virtio, which both offer analagous functionality.

Just passing through

The final, and in some ways most drastic, step is to avoid the emulation overhead by directly connecting virtual machines to physical hardware.

The simplest and theoretically cleanest way of doing this is by offloading storage, for instance, to a SAN; a virtual machine accessing a SAN in principle suffers no more slowdown than a physical server would.

As in the case of a physical server accessing storage over the network, though, this ideally means dedicating network interfaces to storage – which may mean adding multiple network interface cards to the host and configuring dedicated routes between virtual machines and networked storage devices.

Hyper-V also supports pass-through disks, where a virtual machine can directly control a dedicated LUN of a storage device on the host machine.

Windows Server 2008 R2 adds a new feature, Cluster Shared Volumes, which allows multiple hosts to share access to a single storage LUN, adding a degree of scalability to pass-through disks.

VMware currently takes the prize in this department, though, with its ability to directly dedicate not only SCSI controllers, but as of ESX 4, entire physical PCI and PCIe devices to a specific virtual machine.

The VMDirectPath feature allows one or two PCI cards in the host machine to be connected to the operating sysem running in a specific virtual machine rather than being managed by the hypervisor itself – from a simple USB controller to a dedicated physical NIC or storage device.

A slightly more modest optimisation is to place the swapfiles of Windows virtual machines directly onto the host's vmfs storage.

There is always a price to pay, though, and this one is a biggie. There is a significant drawback in attaching dedicated devices to virtual machines, whether they are just disk partitions on the host server or physical interface cards and any attached devices.

Although such techniques can deliver pretty much full native performance, they hinder some of the key advantages of virtualisation: the ability to snapshot virtual machines for backup purposes, duplicate them and migrate them from one host to another.

At best, virtual machines accessing external, non-virtual resources often need to be shut down before migration, and snapshots must also duplicate any external resources – thus removing the scalability and fault-tolerance of virtualisation.

Differences of opinion

Virtualisation is now a key part of the x86 platform and it is not going to go away again. Further hardware advances will continue to improve speed and reduce overhead but there's a long way to go before x86 servers can match the performance and scalability features of systems that have been doing virtualisation for decades, such as IBM's System z mainframes.

On the other hand, paravirtualisation is also important. There are significant gains from having guest operating systems that know they are guests and can request services from the host server or its assistance in performing demanding operations.

Some of the possible improvements will probably remain limited by competitive demands, such as the different virtual machine formats of all the main hypervisor vendors and their totally different driver architectures.

Historically, such issues have receded either when everyone becomes compatible with Microsoft's formats or the functionality moves into hardware.

That is still some way off for x86 virtualisation but it is a rapidly developing area, so watch this space. ®