Before 'the cloud' was cool: Virtualising the un-virtualisable
A brief history of virtualisation
Buzzwords often have very short lifetimes in IT. Today it's cloud computing, but there would be no infinitely scalable cloud without the previous "big new thing": virtualisation. We take it for granted now, but it's worth remembering that it is still quite a new and relatively immature technology, with a long way to go.
In this article, and the next four articles, The Register looks at the history of virtualisation and what lessons we can learn from it about the technology's probable future.
How full-system virtualisation came to the PC
Microsoft, as is its wont, stuck virtualisation into Windows for free, based in part on the goodies it bought in when it acquired Connectix – whose Windows version of VirtualPC Redmond now also gives away for nothing.
With version 2.6.20, the Linux kernel gained its own virtualisation system: KVM, the Kernel Virtual Machine – nothing to do with Keyboard-Video-Mouse switches, but a whole-system virtualisation technique that uses Intel or AMD's hardware virtualisation instructions.
But although virtualisation only came to the PC in the late 1990s, it is nothing really new. IBM mainframes pioneered it in the 1960s and Sun and IBM's big Unix servers have been doing it for years. PCs have supported various forms of virtualisation for over 25 years and it was old back then. But the thing is, there are actually multiple types of virtualisation, not that you'd know that today.
Sexy: running multiple OSs side-by-side simultaneously on a single machine.
At heart, "virtualisation" in this context really just means running one operating system under another, so that a single machine can run multiple OSs side-by-side simultaneously. That's nothing new; you could run multiple copies of DOS, booted straight off floppy, under OS/2 v2.0 in 1992, and the 386 edition of Windows 2.1, known back then as Windows/386, used the Virtual 86 feature of the 80386 CPU to run multiple DOS applications side-by-side.
The Popek and Goldberg Virtualisation Requirements
Even before the 386, with its hardware support for virtualising 8086 code and operating systems, Locus Corporation's DOS Merge software allowed you to run MS-DOS as a task under various forms of Unix, even on an 80286.
Locus was founded by Gerald Popek, who in 1974 co-wrote "Formal Requirements for Virtualizable Third Generation Architectures" with Robert Goldberg – a document that encapsulated what came to be known as the Popek and Goldberg Virtualisation Requirements.
Although in its time it was very handy to be able to run DOS under Unix, or multiple DOS sessions under Windows or OS/2, it would be even more useful if you could run any full PC OS under any other. This is what is called "full system virtualisation," and according to Popek and Goldberg, the x86 architecture couldn’t do it.
Others processors could – it's no problem on IBM POWER or PowerPC, or on SPARC – but x86 didn't meet the second of the requirements: that the virtual machine monitor – the program running on the host OS – could remain in complete control of the hardware.
The problem is that certain instructions in the x86 instruction set can't be trusted – they directly affect the whole machine, so if a guest OS ran them, it would bring down the host OS (and thus all guests too) in a tumbling heap.
Lord of the rings
Although it was hard to get around, the problem is simple to describe. All you have to understand is that computer processors run in several different modes, sometimes called "protection levels".
Classical x86 has four numbered "rings": 0, 1, 2 and 3. The higher the number, the lower the privilege. Programs running in ring 3 have to ask the operating system for resources, and can't see or touch stuff belonging to other programs in ring 3. Code in ring 0 is the boss, and can directly access the hardware, control virtual memory and so on. So, naturally, the core code of the OS itself runs in ring 0, as it is controlling the whole show.
(In fact, the vast majority of PC OSs only use 0 and 3. For the historically-inclined, only two mainstream PC OSs ever actually used more than these two rings. One was IBM's OS/2: its kernel ran in ring 0 and ordinary unprivileged code in ring 3, as usual, but unprivileged code that did I/O ran in ring 2.
This is why OS/2 won’t run under Oracle’s open-source hypervisor VirtualBox in its software-virtualisation mode, which forces Ring 0 code in the guest OS to run in Ring 1. The other exception - depending on configuration - was Novell Netware 4 and above, which could move NLMs into higher rings for better stability, or lower ones for more performance.)
The trick of full-system virtualisation is to find a way to take control away from privileged code that is already running in ring 0, so that another, even-more-privileged program can control it. If you can do this transparently – meaning that the guest does not know it's being manipulated – then you can then run one operating system as a program under another. That gives you the ability to run more than one OS at a time on a single machine.
The result is a hierarchy of OSs. In the old days, early OSs were called by the rather more descriptive name of "supervisors". An OS is just another program, but it's the one that supervises other programs. What do you call a program that supervises multiple supervisors? A "hypervisor". The OSs running under it are called "guests" and the boxes they run in are "virtual machines" or VMs.
Making the virtual indistinguishable from the real
The snag is, for this to be any use, you have to be able to do it invisibly. A VM has to be indistinguishable from a real machine. Unless the fakery is perfect, you cannot run an unmodified OS under it.
VMware's clever innovation was to find a way round this, by faking the functionality in software. The technique is related to the way that emulation of old computers makes it possible to, say, run an Amiga game on your PC. Emulators like WinUAE are a bit like reading a book in a language you don't speak by looking up every single word in a dictionary.
It works, and with a good enough dictionary, you can understand it – this is how online translators like Google Translate work. It's no substitute for understanding the language natively, though. When a computer runs an emulator, every single instruction of the emulated program is looked up and translated. The program runs, but slowly – tens of times more slowly than native code that doesn't need to be translated.
What VMware did was find a way to examine executing PC code, isolate out the bits that run in ring 0 and run them and only them through an x86 emulator. Code in lower rings runs natively.
Oddly enough, it's possible to write a very efficient x86 emulator for the x86, as the chip is a very close architectural match to itself. Thus, x86-on-x86 emulation is much faster than, say, the inefficient process of emulating an Amiga's Motorola 68000 on an x86 chip, which only works because modern PCs are hundreds of times faster than mid-1980s computers.
Yes We Can
Like PowerQuest's PartitionMagic, once one company proved it could do something previously thought impossible, others realised that they could do it, too. For years, Connectix sold a Mac program called VirtualPC: a complete PC emulator for MacOS. VirtualPC let you run Windows – or any other PC OS – in a window on your non-x86-compatible PowerPC-based Mac. It worked, but it was slow.
After VMware, though, Connectix produced a PC version, aping VMware's separation of safe and unsafe code. Once you work out how to do it quickly, a PC emulator for the PC can be handy for all sorts of reasons – testing out new OSs, deploying "the right one for the job" even when you've got lots of jobs to do and there's no single "right" or "best" choice. For instance, running Windows on an Intel Mac under Mac OS X – or on a Linux PC.
This product is why Microsoft bought Connectix and it's the core technology behind "Windows XP Mode" on Windows 7 and Hyper-V on Windows Server.
The closest you could do on x86 without the software-emulation "cheat" was paravirtualisation, as done by the open-source Xen hypervisor. If the hardware doesn't have facilities to trap unsafe instructions, then you just trap what you can and modify the guest OSs not to use unsafe instructions. This is fine for open source OSs like Linux and the BSDs, but no use for commercial off-the-shelf stuff unless its makers cooperate with you and do special versions.
Once PC virtualisation took off, in 2005 the big x86 processor makers belatedly noticed which way the wind was blowing and added special instructions for hypervisors to their CPUs, creating a ring below ring 0: "ring minus-one."
This powers Linux's KVM, and Xen added support for it too, allowing Xen to run Windows in a VM – so long as you have a VT-capable processor. Sadly, VT is one of the features disabled in a lot of budget models of CPU. VMware still eschews it, feeling that its software virtualisation engine is more efficient.
Now, virtualisation is everywhere. There's a wide choice of hypervisors for the x86. Parallels offers commercial ones, VMware offers both freeware and commercial options, and Microsoft gives away a range of them. For Linux, Xen and KVM offer a choice of free open-source hypervisor modules.
The thing is that they all do the same thing: split a system up into multiple independent subsystems, each of which can run its own complete OS – what's called "full-system virtualisation". This is not the only way to do it, and on more mature virtualisation platforms, it is a relatively rare method, because it is rife with inefficiencies and weaknesses – but if you know nothing else, they are hard to spot.
The problems only become apparent when you compare the way virtualisation is done on the PC with the ways that other systems perform it. ®
In the next part of this series, we'll look at the invention of virtualisation – 40 years ago, on IBM mainframes.