Cheap as chips: The future of PC virtualisation
A brief history of virtualisation VMware was founded in 1998, and until the launch of its eponymous product the next year, the PC’s x86 architecture had been considered to be impossible to fully virtualise.
Since that time, although VMware continues to prosper, prices of virtualisation tools have fallen to an all-time low – in fact, most hypervisors are free, with just the management tools costing money.
The canny are therefore asking if or when the bubble is going to burst. It looks like the classic hype cycle: following the "peak of inflated expectations" comes the "trough of disillusionment."
Some of the key weaknesses of current x86 virtualisation methods and technologies can be revealed by comparing PC hypervisors with those on mainframes and large Unix servers.
For instance, compared to mainframe partitioning, if you use a full PC server OS to run full-system VMs containing other full server or client OSs, the result is horribly inefficient.
Whole layers of the software stack are duplicated on both host and inside multiple VMs. It doesn’t even make a huge difference if the host runs Linux and the guests Windows, or the other way round – either way, there is functional duplication in the stack.
On an x86 server with guest OSs running under a hypervisor, a full copy of Windows (say) is running on an emulated chipset connected to emulated disk drives (formatted with a normal filesystem), and talking through emulated Ethernet cards to another real operating system – which is storing the VM images in another filesystem running on a real disk.
If you’re running current or recent Windows guests under a Windows 2008 Server R2 with Hyper-V, then the Enlightened I/O drivers allow for dynamic memory sizing and reasonably efficient driver communication between guests and host – but there is still a lot of duplicated code, which is both wasteful and inefficient.
It means VMs run more slowly and take more disk space and memory. It also means that all the OSs in the stack must be patched and updated separately.
If you have half a dozen Windows servers running in VMs, then that is half a dozen copies of Windows that must be updated – and possibly a host copy as well. Even if tools such as WSUS ease the deployment, it still has to be done.
On the other hand, if the guests are running under VMware ESX or XenServer, then the host OS is relatively simple and lightweight, but the guests are running under pure software emulation of complete systems – in VMware's case, complete with emulation – albeit heavily-optimised – of the host CPU for running Ring 0 code. This means a significant amount of emulation overhead.
A lot of duplicated code, which is wasteful and inefficient
Let’s consider what could be eliminated. In part three of this series, we looked at the Unix way of doing things: OS-level virtualisation. This means that only the userland of the OS is virtualised, with multiple userlands running atop a single kernel. One installed copy of the OS can appear to be dozens or more – but all sharing the same core binaries, the same memory and real native unvirtualised CPUs.
Parallels’ Virtuozzo Containers brings this integral feature of Solaris, AIX and FreeBSD to Windows. Virtuozzo isn’t cheap, whereas Hyper-V and VMware are essentially free – but then again, half a dozen Virtuozzo VMs need no more RAM and disk than would be taken by installing all the apps in them straight on the host OS. The savings can be very considerable indeed, and the host server’s resources are shared equally by all the VMs – no partitioning or allocation is required.
Another big-iron model
If you need the guest OSs to be different from the host – for example, hosting multiple Windows XP guests – then another big-iron model could be applied. We looked at IBM mainframe-style virtualisation in part two of this series - where it’s normal to run a specialised host OS on the bare metal, supporting specialised guest OSs that can only run inside VMs. How could this be applied to Windows?
If there were special editions of Windows for running inside guests, let’s look at what could be removed. In the days of MS-DOS, PCs were sometimes configured as “diskless workstations” – clients with no local hard drive that booted over the network from a server, then mounted a share as their C drive.
These days, the technology is called “OS streaming” and several companies offer versions for modern versions of Windows, including Citrix Provisioning Server and Xstreaming Technology’s VHD.
The technology combines well with things like Windows Server’s Volume Shadow Copy service. Multiple VMs can boot off a single shared drive with writes being redirected to another volume, whose contents can be discarded when the machines shut down.
In this way, a dedicated guest edition of Windows wouldn't need emulated disk drives or indeed a filesystem of its own at all. It could just store its files natively in the host’s filesystem – one central set of binaries for dozens or hundreds of machines. It wouldn’t need an emulated network card, either, just a simple soft link to the hypervisor – no emulated chipsets or other complexities.
Stripping out the client OS like this would do away with several layers of indirection and emulation.
By the same token, the guest OS would only ever be served to a single user, so it would not need any integral support for creating multiple users, storing their profiles, switching between them and so on – all its data and configuration would be stored on a server anyway. It would need no hardware detection or device drivers of its own – any drivers it needed would be put in place when it was provisioned and the virtual hardware platform is essentially static and unchanging.
Memory allocation of VMs is getting more flexible with time. VMware ESXi lets you "overcommit" a server, assigning more RAM to VMs than the server actually has.
Meanwhile, if you run a version of Windows with "Enlightened I/O" under Microsoft’s Hyper-V, the memory size of the guest can be configured dynamically according to how much the host has free. However, there’s a more elegant method of sharing memory between host and guests than either of these.
The model to follow comes from an foreign platform, though. There are two different versions of the Linux kernel that are designed to run as programs under another OS: User Mode Linux and coLinux. UML is a version of the kernel rewritten to run as a userland program – i.e., in the processor’s Ring 3 – under a parent Linux system.
To the parent OS, it appears as a single big process, but inside that process is a complete guest OS – no VM required. coLinux does superficially the same thing on a Windows host, although the implementation is very different.
The point being that if a kernel is designed to run under another OS, it can be written so as to request memory and other resources from the host system. With current x86 virtualisation, each guest needs an emulated memory controller, its own allocation of RAM and a complete emulated motherboard chipset – even if hypervisor-aware drivers improve the performance of running systems. None of this is necessary with a purpose-built kernel, which needs only a few simple drivers to handle communication with the host system.
Virtualisation on x86 has a long way to go before it catches up
These are of course just idle speculations, but they give a flavour of how much smaller and simpler a custom "Guest Windows" could be, should Microsoft ever decide to build such a thing.
The key points to take away from all this?
No matter how mature it seems, virtualisation on x86 has a long way to go before it catches up with the systems that were doing it decades before it came to the PCs.
There is already a specialised host version of Windows Server called Hyper-V Server, which like all of Microsoft's virtualisation tools is a free download. Even so, the potential benefits from a specialised guest version of Windows would be far greater.
But even if no such product ever appears, it would help if future versions of Windows could install in a special "guest" mode, with a kernel designed to be hypervisor-aware when running under another, host OS.
Even a full-fat edition of Windows would perform better in this mode if its kernel were able to communicate directly with the hypervisor rather than using multiple software emulations of PC hardware or even optimised drivers. Not only that, but it would be easier to manage and would use resources more efficiently.
Beyond this, full-system virtualisation is not the only way to do it, and there are persuasive advantages to operating-system level virtualisation as well. For some roles, where you expect to run identical host and guest OSs, OS-level virtualisation delivers much the same benefits but with dramatically lower resource usage and the management – and licensing – savings of a single system image to configure, maintain and patch.
A final thought is, sadly, perhaps the least likely to appear. There are already quite a few full-system virtualisation products for various operating systems: Bochs, QEMU, KVM, Xen, VMware, Parallels, VirtualBox and the various Microsoft offerings.
The Linux KVM hypervisor shares code with QEMU and thus shares a VM format. All the Microsoft offerings use a common format, too, derived from Connectix VirtualPC's VHD files. Most of the others, however, do not. The difference goes deeper than the arrangement of files in the host's filesystem: the virtual hardware made available to guests differs from one hypervisor to another, as well.
A single, common virtual hardware platform, at least, would be a big win – and a single on-disk format for virtual machines even better.
The PC industry does not have a good track record at adopting standard formats for interchange between rival systems – where there are standards, such as RTF files, they are at best secondary to products' native formats. If there is one common element, it tends to be everyone else adapting their products to read and write the Microsoft file formats.
It is to hypervisor vendors' advantage if their users are locked-in, though; it discourages a VMware house from migrating to Hyper-V, for instance. It would be very convenient for users if they could skip readily between different vendors' hypervisors, but don't hold your breath for that to happen any time soon. ®