Before 'the cloud' was cool: Virtualising the un-virtualisable
A brief history of virtualisation
Buzzwords often have very short lifetimes in IT. Today it's cloud computing, but there would be no infinitely scalable cloud without the previous "big new thing": virtualisation. We take it for granted now, but it's worth remembering that it is still quite a new and relatively immature technology, with a long way to go.
In this article, and the next four articles, The Register looks at the history of virtualisation and what lessons we can learn from it about the technology's probable future.
How full-system virtualisation came to the PC
Microsoft, as is its wont, stuck virtualisation into Windows for free, based in part on the goodies it bought in when it acquired Connectix – whose Windows version of VirtualPC Redmond now also gives away for nothing.
With version 2.6.20, the Linux kernel gained its own virtualisation system: KVM, the Kernel Virtual Machine – nothing to do with Keyboard-Video-Mouse switches, but a whole-system virtualisation technique that uses Intel or AMD's hardware virtualisation instructions.
But although virtualisation only came to the PC in the late 1990s, it is nothing really new. IBM mainframes pioneered it in the 1960s and Sun and IBM's big Unix servers have been doing it for years. PCs have supported various forms of virtualisation for over 25 years and it was old back then. But the thing is, there are actually multiple types of virtualisation, not that you'd know that today.
Sexy: running multiple OSs side-by-side simultaneously on a single machine.
At heart, "virtualisation" in this context really just means running one operating system under another, so that a single machine can run multiple OSs side-by-side simultaneously. That's nothing new; you could run multiple copies of DOS, booted straight off floppy, under OS/2 v2.0 in 1992, and the 386 edition of Windows 2.1, known back then as Windows/386, used the Virtual 86 feature of the 80386 CPU to run multiple DOS applications side-by-side.
The Popek and Goldberg Virtualisation Requirements
Even before the 386, with its hardware support for virtualising 8086 code and operating systems, Locus Corporation's DOS Merge software allowed you to run MS-DOS as a task under various forms of Unix, even on an 80286.
Locus was founded by Gerald Popek, who in 1974 co-wrote "Formal Requirements for Virtualizable Third Generation Architectures" with Robert Goldberg – a document that encapsulated what came to be known as the Popek and Goldberg Virtualisation Requirements.
Although in its time it was very handy to be able to run DOS under Unix, or multiple DOS sessions under Windows or OS/2, it would be even more useful if you could run any full PC OS under any other. This is what is called "full system virtualisation," and according to Popek and Goldberg, the x86 architecture couldn’t do it.
Others processors could – it's no problem on IBM POWER or PowerPC, or on SPARC – but x86 didn't meet the second of the requirements: that the virtual machine monitor – the program running on the host OS – could remain in complete control of the hardware.
The problem is that certain instructions in the x86 instruction set can't be trusted – they directly affect the whole machine, so if a guest OS ran them, it would bring down the host OS (and thus all guests too) in a tumbling heap.
Lord of the rings
Although it was hard to get around, the problem is simple to describe. All you have to understand is that computer processors run in several different modes, sometimes called "protection levels".
Classical x86 has four numbered "rings": 0, 1, 2 and 3. The higher the number, the lower the privilege. Programs running in ring 3 have to ask the operating system for resources, and can't see or touch stuff belonging to other programs in ring 3. Code in ring 0 is the boss, and can directly access the hardware, control virtual memory and so on. So, naturally, the core code of the OS itself runs in ring 0, as it is controlling the whole show.
(In fact, the vast majority of PC OSs only use 0 and 3. For the historically-inclined, only two mainstream PC OSs ever actually used more than these two rings. One was IBM's OS/2: its kernel ran in ring 0 and ordinary unprivileged code in ring 3, as usual, but unprivileged code that did I/O ran in ring 2.
This is why OS/2 won’t run under Oracle’s open-source hypervisor VirtualBox in its software-virtualisation mode, which forces Ring 0 code in the guest OS to run in Ring 1. The other exception - depending on configuration - was Novell Netware 4 and above, which could move NLMs into higher rings for better stability, or lower ones for more performance.)
The trick of full-system virtualisation is to find a way to take control away from privileged code that is already running in ring 0, so that another, even-more-privileged program can control it. If you can do this transparently – meaning that the guest does not know it's being manipulated – then you can then run one operating system as a program under another. That gives you the ability to run more than one OS at a time on a single machine.
The result is a hierarchy of OSs. In the old days, early OSs were called by the rather more descriptive name of "supervisors". An OS is just another program, but it's the one that supervises other programs. What do you call a program that supervises multiple supervisors? A "hypervisor". The OSs running under it are called "guests" and the boxes they run in are "virtual machines" or VMs.
Is it just me...
...or does that post look like someone's testing a context-sensitive spambot?
One thing that can't be virtualised
There is one thing that can't be virtualised, and that is time - or at least it can't be virtualised where an OS has to interact with the real world. This can have some unfortunate side effects in terms of performance, clock slip and so on. For instance, any OS using "wall clock" time for things like timeouts, task switching and the like can produce some undesirable features on a heavily stressed machine. This is especially true when the hyperviser is able to page part of the guest environment. This causes erratic and very lengthy (by CPU standard) lumps of time to appear to be used during execution if "wall clock" time is used.
From my experience eventually all OSs which are expected to run under hypervisers eventually have to be modified in some way to be "hyperviser aware" in order to iron out these wrinkles. Many years ago I worked on an OLTP operating system that ran under VM - in order to fix some timing issues it was necessary to modify some core timing functions in the guest OS to avoid using wall-clock time and get execution time information from the hyperviser.
You can get away with this stuff on lightly loaded environments, but not on heavily committed ones.
Linux virtualisation is from before KVM
Linux has not one, but _THREE_ native virtualisation technologies:
1. User Mode Linux which is from circa Y2K, long before KVM. Even if we count from the day when it aquired SKAS0 (or 3) support so it could have reasonable address space isolation it is still pre-KVM
2. OpenVZ - also pre-KVM
3. KVM is the third one chronologically and unless I am mistaken it actually derives from qemu and shares some code with it. So if we count the days of emulation into its history it also goes further back.
By the way, depending on what you want (and how good are you at C/Linux kernel drivers) KVM is quite often not the best fit for purpose either. Neither is Xen, nor is Vmware. There are a lot of cases where OpenVZ (and even UML if your kernel programming is good enough to fix its shortcomings) can do a better job.