Another big-iron model
If you need the guest OSs to be different from the host – for example, hosting multiple Windows XP guests – then another big-iron model could be applied. We looked at IBM mainframe-style virtualisation in part two of this series - where it’s normal to run a specialised host OS on the bare metal, supporting specialised guest OSs that can only run inside VMs. How could this be applied to Windows?
If there were special editions of Windows for running inside guests, let’s look at what could be removed. In the days of MS-DOS, PCs were sometimes configured as “diskless workstations” – clients with no local hard drive that booted over the network from a server, then mounted a share as their C drive.
These days, the technology is called “OS streaming” and several companies offer versions for modern versions of Windows, including Citrix Provisioning Server and Xstreaming Technology’s VHD.
The technology combines well with things like Windows Server’s Volume Shadow Copy service. Multiple VMs can boot off a single shared drive with writes being redirected to another volume, whose contents can be discarded when the machines shut down.
In this way, a dedicated guest edition of Windows wouldn't need emulated disk drives or indeed a filesystem of its own at all. It could just store its files natively in the host’s filesystem – one central set of binaries for dozens or hundreds of machines. It wouldn’t need an emulated network card, either, just a simple soft link to the hypervisor – no emulated chipsets or other complexities.
Stripping out the client OS like this would do away with several layers of indirection and emulation.
By the same token, the guest OS would only ever be served to a single user, so it would not need any integral support for creating multiple users, storing their profiles, switching between them and so on – all its data and configuration would be stored on a server anyway. It would need no hardware detection or device drivers of its own – any drivers it needed would be put in place when it was provisioned and the virtual hardware platform is essentially static and unchanging.
Memory allocation of VMs is getting more flexible with time. VMware ESXi lets you "overcommit" a server, assigning more RAM to VMs than the server actually has.
Meanwhile, if you run a version of Windows with "Enlightened I/O" under Microsoft’s Hyper-V, the memory size of the guest can be configured dynamically according to how much the host has free. However, there’s a more elegant method of sharing memory between host and guests than either of these.
The model to follow comes from an foreign platform, though. There are two different versions of the Linux kernel that are designed to run as programs under another OS: User Mode Linux and coLinux. UML is a version of the kernel rewritten to run as a userland program – i.e., in the processor’s Ring 3 – under a parent Linux system.
To the parent OS, it appears as a single big process, but inside that process is a complete guest OS – no VM required. coLinux does superficially the same thing on a Windows host, although the implementation is very different.
The point being that if a kernel is designed to run under another OS, it can be written so as to request memory and other resources from the host system. With current x86 virtualisation, each guest needs an emulated memory controller, its own allocation of RAM and a complete emulated motherboard chipset – even if hypervisor-aware drivers improve the performance of running systems. None of this is necessary with a purpose-built kernel, which needs only a few simple drivers to handle communication with the host system.
Virtualisation on x86 has a long way to go before it catches up
These are of course just idle speculations, but they give a flavour of how much smaller and simpler a custom "Guest Windows" could be, should Microsoft ever decide to build such a thing.
The key points to take away from all this?
No matter how mature it seems, virtualisation on x86 has a long way to go before it catches up with the systems that were doing it decades before it came to the PCs.
There is already a specialised host version of Windows Server called Hyper-V Server, which like all of Microsoft's virtualisation tools is a free download. Even so, the potential benefits from a specialised guest version of Windows would be far greater.
But even if no such product ever appears, it would help if future versions of Windows could install in a special "guest" mode, with a kernel designed to be hypervisor-aware when running under another, host OS.
Even a full-fat edition of Windows would perform better in this mode if its kernel were able to communicate directly with the hypervisor rather than using multiple software emulations of PC hardware or even optimised drivers. Not only that, but it would be easier to manage and would use resources more efficiently.
Beyond this, full-system virtualisation is not the only way to do it, and there are persuasive advantages to operating-system level virtualisation as well. For some roles, where you expect to run identical host and guest OSs, OS-level virtualisation delivers much the same benefits but with dramatically lower resource usage and the management – and licensing – savings of a single system image to configure, maintain and patch.
A final thought is, sadly, perhaps the least likely to appear. There are already quite a few full-system virtualisation products for various operating systems: Bochs, QEMU, KVM, Xen, VMware, Parallels, VirtualBox and the various Microsoft offerings.
The Linux KVM hypervisor shares code with QEMU and thus shares a VM format. All the Microsoft offerings use a common format, too, derived from Connectix VirtualPC's VHD files. Most of the others, however, do not. The difference goes deeper than the arrangement of files in the host's filesystem: the virtual hardware made available to guests differs from one hypervisor to another, as well.
A single, common virtual hardware platform, at least, would be a big win – and a single on-disk format for virtual machines even better.
The PC industry does not have a good track record at adopting standard formats for interchange between rival systems – where there are standards, such as RTF files, they are at best secondary to products' native formats. If there is one common element, it tends to be everyone else adapting their products to read and write the Microsoft file formats.
It is to hypervisor vendors' advantage if their users are locked-in, though; it discourages a VMware house from migrating to Hyper-V, for instance. It would be very convenient for users if they could skip readily between different vendors' hypervisors, but don't hold your breath for that to happen any time soon. ®
As the joke goes...
..when asked directions to the manor house, the village idiot thought for a while then said, "well, I would be starting from here".
Mainframe and UNIX virtulisation was designed-in to a well thought out system. The guest OS were generally also of such a planned type.
As already pointed out, there are a lot of reasons for x86 VM that have nothing to do with per-user tailoring. That after all should be something that generally 'just works' by the OS.
We have issues running older Windows and Linux versions due to security (need the OS to run something legacy, can't trust it on its own) and due to the loss of hardware support with time (hence the attraction of virtulised network cards, etc).
For other reasons then things could be improved by a less complex stack, and VM tools that allow hot migration of a running machine from server to server, etc, offer great advantages in uptime. Except of course when those tools come with bugs...
Good article, with a few ommissions and oversights
It is good to see that finally somebody dares speak the truth about the overheads and inefficiency of full virtualization. While the sales brochures boast overheads in low figure percentage points, anybody who actually bothered to test this on any realistic workload will find that the overheads on full virtualization are in the region of 30-40%, and this applies across all PC virtualization products, be it VMware, Xen or KVM. But most people neither bother doing their own testing nor do they have enough understanding to be able to apply optimizations on bare metal that become virtually impossible (no pun intended) when virtualization is used.
One thing that is overlooked in the article is that KVM and Xen have certain advantages in terms of overheads. KVM uses the core features already built into the kernel (e.g. the scheduler), whereas Xen and VMware bring their own. Xen, however, has the ability to do half-virtualization, where the guest doesn't run a kernel of it's own but relies on the host kernel to run the container's processes. But apart from only being to run the same guest OS as the host, this still involves a container, which comes with more overheads than chroot-style virtualization, a-la OpenVZ (free and open source project that Virtuozzo is based on), Linux Containers (LXC - not yet deemed stable, but it is in the mainline kernel), and Linux VServer (has a killer feature over OpenVZ and LXC - copy-on-write hard-link file unification which reduces memory usage, page cache usage and disk space all at the same time).
VServer's copy-on-write hard-link file unification is pretty much the mother of all memory deduplication approaches. For a start, it's free - once you unify the files by hard-linking them, all the executables and shared libraries will implicitly mmap to the same memory space (based on the inode number). That means if you have 100 guests, you only have 1 instance of glibc, rather than 100. This allows for some truly mind-boggling guest counts on a single host (hundreds). Best of all, there is no expensive run-time memory de-duplication required, a-la what VMware does or what KVM does using KSM (Kernel Same-page Merge) - it is all implicit. The savings in terms off disk space and caches (page cache, CPU cache) are a bonus on top.
Bochs isn't a virtualization tool, it is an emulator and as such shouldn't be listed in the same group as the others. QEMU has the ability to do emulation, too, but it is also used as a front end for KVM (and KQEMU until recently), so it mostly earns itself a place among the virtualization tools.
Finally, switching between VM technologies is not particularly difficult. On UNIX OS-es it is usually as simple as tar-ing the files to a fresh container and re-installing the boot sector on the new VM.
the Obvious That Needed To Be stated
You can't get a quart out of a pint pot.
That is all our grandmothers would bother to tell the PC virtualisation consultant as they showed him the door.
Unfortunately, I don't doubt that it will be need to be stated again and again, as this technology continues to be mis-sold to managers who will not think to refer the matter to their grannies first.
Congratulations on an excellent series of articles. It has taught me a lot. I had some gut feelings about it: it is nice to have confirmation of them.