Cheap as chips: The future of PC virtualisation
A brief history of virtualisation VMware was founded in 1998, and until the launch of its eponymous product the next year, the PC’s x86 architecture had been considered to be impossible to fully virtualise.
Since that time, although VMware continues to prosper, prices of virtualisation tools have fallen to an all-time low – in fact, most hypervisors are free, with just the management tools costing money.
The canny are therefore asking if or when the bubble is going to burst. It looks like the classic hype cycle: following the "peak of inflated expectations" comes the "trough of disillusionment."
Some of the key weaknesses of current x86 virtualisation methods and technologies can be revealed by comparing PC hypervisors with those on mainframes and large Unix servers.
For instance, compared to mainframe partitioning, if you use a full PC server OS to run full-system VMs containing other full server or client OSs, the result is horribly inefficient.
Whole layers of the software stack are duplicated on both host and inside multiple VMs. It doesn’t even make a huge difference if the host runs Linux and the guests Windows, or the other way round – either way, there is functional duplication in the stack.
On an x86 server with guest OSs running under a hypervisor, a full copy of Windows (say) is running on an emulated chipset connected to emulated disk drives (formatted with a normal filesystem), and talking through emulated Ethernet cards to another real operating system – which is storing the VM images in another filesystem running on a real disk.
If you’re running current or recent Windows guests under a Windows 2008 Server R2 with Hyper-V, then the Enlightened I/O drivers allow for dynamic memory sizing and reasonably efficient driver communication between guests and host – but there is still a lot of duplicated code, which is both wasteful and inefficient.
It means VMs run more slowly and take more disk space and memory. It also means that all the OSs in the stack must be patched and updated separately.
If you have half a dozen Windows servers running in VMs, then that is half a dozen copies of Windows that must be updated – and possibly a host copy as well. Even if tools such as WSUS ease the deployment, it still has to be done.
On the other hand, if the guests are running under VMware ESX or XenServer, then the host OS is relatively simple and lightweight, but the guests are running under pure software emulation of complete systems – in VMware's case, complete with emulation – albeit heavily-optimised – of the host CPU for running Ring 0 code. This means a significant amount of emulation overhead.
A lot of duplicated code, which is wasteful and inefficient
Let’s consider what could be eliminated. In part three of this series, we looked at the Unix way of doing things: OS-level virtualisation. This means that only the userland of the OS is virtualised, with multiple userlands running atop a single kernel. One installed copy of the OS can appear to be dozens or more – but all sharing the same core binaries, the same memory and real native unvirtualised CPUs.
Parallels’ Virtuozzo Containers brings this integral feature of Solaris, AIX and FreeBSD to Windows. Virtuozzo isn’t cheap, whereas Hyper-V and VMware are essentially free – but then again, half a dozen Virtuozzo VMs need no more RAM and disk than would be taken by installing all the apps in them straight on the host OS. The savings can be very considerable indeed, and the host server’s resources are shared equally by all the VMs – no partitioning or allocation is required.
Next page: Another big-iron model
As the joke goes...
..when asked directions to the manor house, the village idiot thought for a while then said, "well, I would be starting from here".
Mainframe and UNIX virtulisation was designed-in to a well thought out system. The guest OS were generally also of such a planned type.
As already pointed out, there are a lot of reasons for x86 VM that have nothing to do with per-user tailoring. That after all should be something that generally 'just works' by the OS.
We have issues running older Windows and Linux versions due to security (need the OS to run something legacy, can't trust it on its own) and due to the loss of hardware support with time (hence the attraction of virtulised network cards, etc).
For other reasons then things could be improved by a less complex stack, and VM tools that allow hot migration of a running machine from server to server, etc, offer great advantages in uptime. Except of course when those tools come with bugs...
Good article, with a few ommissions and oversights
It is good to see that finally somebody dares speak the truth about the overheads and inefficiency of full virtualization. While the sales brochures boast overheads in low figure percentage points, anybody who actually bothered to test this on any realistic workload will find that the overheads on full virtualization are in the region of 30-40%, and this applies across all PC virtualization products, be it VMware, Xen or KVM. But most people neither bother doing their own testing nor do they have enough understanding to be able to apply optimizations on bare metal that become virtually impossible (no pun intended) when virtualization is used.
One thing that is overlooked in the article is that KVM and Xen have certain advantages in terms of overheads. KVM uses the core features already built into the kernel (e.g. the scheduler), whereas Xen and VMware bring their own. Xen, however, has the ability to do half-virtualization, where the guest doesn't run a kernel of it's own but relies on the host kernel to run the container's processes. But apart from only being to run the same guest OS as the host, this still involves a container, which comes with more overheads than chroot-style virtualization, a-la OpenVZ (free and open source project that Virtuozzo is based on), Linux Containers (LXC - not yet deemed stable, but it is in the mainline kernel), and Linux VServer (has a killer feature over OpenVZ and LXC - copy-on-write hard-link file unification which reduces memory usage, page cache usage and disk space all at the same time).
VServer's copy-on-write hard-link file unification is pretty much the mother of all memory deduplication approaches. For a start, it's free - once you unify the files by hard-linking them, all the executables and shared libraries will implicitly mmap to the same memory space (based on the inode number). That means if you have 100 guests, you only have 1 instance of glibc, rather than 100. This allows for some truly mind-boggling guest counts on a single host (hundreds). Best of all, there is no expensive run-time memory de-duplication required, a-la what VMware does or what KVM does using KSM (Kernel Same-page Merge) - it is all implicit. The savings in terms off disk space and caches (page cache, CPU cache) are a bonus on top.
Bochs isn't a virtualization tool, it is an emulator and as such shouldn't be listed in the same group as the others. QEMU has the ability to do emulation, too, but it is also used as a front end for KVM (and KQEMU until recently), so it mostly earns itself a place among the virtualization tools.
Finally, switching between VM technologies is not particularly difficult. On UNIX OS-es it is usually as simple as tar-ing the files to a fresh container and re-installing the boot sector on the new VM.
the Obvious That Needed To Be stated
You can't get a quart out of a pint pot.
That is all our grandmothers would bother to tell the PC virtualisation consultant as they showed him the door.
Unfortunately, I don't doubt that it will be need to be stated again and again, as this technology continues to be mis-sold to managers who will not think to refer the matter to their grannies first.
Congratulations on an excellent series of articles. It has taught me a lot. I had some gut feelings about it: it is nice to have confirmation of them.