At last! Virtual domain controllers just work
VDCs in Server 2012 ease sysadmin headaches
Virtual domain controllers (VDCs) in Server 2012 – and now 2012 R2 – are awesome.
I have used domain controllers inside virtual machines since Virtual Server 2005 and have seen them fail in every way imaginable. VDCs address all of my issues and, considering the features they bring to the table, it is flat out nuts not to use this technology.
There are three primary scenarios where traditional domain controllers fail in a virtualised environment: restoring an individual domain controller from a backup into an existing environment; "oh damn"-class disaster recovery (where everything is coming from backups); and cloning.
To my delight and amazement VDCs cope with all three scenarios.
With the disaster recovery stuff Microsoft has created a new feature where version one is not crippled, half-assed or missing the features that made us want it in the first place.
If anyone of influence at Microsoft reads this, the people in charge of this project should be running the whole company. At the very least, buy them a tropical island. Maybe next to the one you should have bought the storage team by now.
I am a lot more skeptical about the cloning features. They strike me as being good for a narrow-use case, while still missing the mark. Microsoft should not have used the word here, as what most sysadmins think of as cloning and what Microsoft calls cloning tangentially intersect.
Microsoft has some comprehensive documentation on the how and why of VDCs. This naturally includes some PowerShell examples for those who choose to script the process. It is worth bookmarking as you will need to reference it at some point.
Count the blessings
The heart and soul of VDCs is the VM-Generation ID. This is one of those "so simple it's brilliant" ideas that I wish we'd had years ago when virtualisation started to take off.
The simple version of the VM-Generation ID is a counter. One copy of the counter is kept in the virtual machine by the operating system and another is maintained by the host. Any time you do something to the virtual machine – suspend it, snapshot it, restart it or what have you – the counter is incremented.
If the counter inside the virtual machine is different from the counter maintained by the host the virtual machine knows that something has occurred beyond normal operation.
Its wide adoption is a matter of time, given how useful the concept is
Now that VM-Generation ID exists, in theory any operating system could make use of it. I suspect its wide adoption is a matter of time, given how useful the concept is to any number of applications.
In the context of a Server 2012 domain controller, VM-Generation ID is used by the Active Directory service to determine if it should trust the local copy of the Active Directory. If the value of VM-Generation ID inside the virtual machine does not match that of the host then the Active Directory will invalidate its RID pool and any changes to the invocation ID.
In other words, any pending on that domain controller are not sent out to other domain controllers on the network; and the domain controller that has discovered its database is out of sync will fetch a clean copy from an unaffected domain controller.
This is great for people like me who don't have the licences to burn on making my domain controller just domain controllers. Mine are generally DHCP servers and print servers as well.
They have done this job quite well for more than a decade, but once every three years or so a printer driver update will go sideways. The ability to just restore from a snapshot would be really useful.
It is also useful for those instances where Patch Tuesday touches your week with a borked Windows update. Because you stagger your domain controller update days… right?
I emphasise the importance of staggered updates because the ability to recover a virtual machine from snapshot or backup in this manner is dependent on there being a "good" domain controller on the network from which to fetch a clean copy of the Active Directory. You have to deal with things differently if you break all your domain controllers at the same time.
You broke them all?
You should take a slightly different approach to getting things up and running if you break all your domain controllers at the same time – less difficult than it sounds, especially for companies with few domain controllers. The short version is "bring up the domain controllers that own FSMO roles (flexible single master operations) first."
The first domain controllers up should be the PDC emulator followed by the RID master. In many instances they are one and the same system but they could just as easily be two different ones as they are separate FSMO roles.
These need to come up before anything else so that you have the core infrastructure of an Active Directory network up and running, at least enough for the domain controllers to chat among themselves and determine who is boss.
Bring up any remaining FSMO role domain controllers and make sure you have at least one global catalogue (GC) server. (GCs end up being important for the smooth operation of anything and everything in a Windows network.)
Manually trigger replication between these servers to make sure they can all talk among themselves. If one of them gives you grief a restart should get it syncing with the rest.
By this point you have managed to get at least one domain controllers up that believes it is authoritative and the Active Directory infrastructure required to replicate among further domain controllers online and waiting.
Any additional domain controllers you bring online will behave just as in the previous section: they will wake up, realise something is wrong and grab a clean copy of the directory from the rest of the network.
For the curious, VM-Generation ID is supported in Hyper-V 3.0 and later as well as VMware 5.0 u2 and later. Commits were added to both the Xen and KVM development chains well over a year ago. If support hasn't already been patched in to your favourite distro, it will be soon.
Bring on the clones
Not just anyone can make a clone. You need to get permission. What is more, there are all sorts of restrictions on how and when you can clone, and your clones might not quite end up being exact copies.
To clone a domain controller it must be added to the Cloneable Domain Controllers group. This can be done through the Active Directory Administrative Center. You also have to have your PDC emulator FSMO role running on a Server 2012 domain controller that is not the domain controller you are trying to clone.
Next up is Get-ADDCCloningExcludedApplicationList, a nice bit of PowerShell drudgery that gives you a list of all of the installed applications and servers not known by Microsoft to be clonable. It is then your job to run to the vendor and ask if these applications are designed to survive the cloning process.
You need to manually add each of the applications and services that you want to exist in the cloned copy of the domain controller to CustomDCCloneAllowList.xml. This goes alongside DCCloneConfig.xml which is generated via the New-ADDCCloneConfigFile PowerShell cmdlet.
DCCloneConfig.xml contains the configuration information for the eventual clone. This is mostly IP, DNS and Windows Internet Name Service configuration information.
You can specify static IPv4 addresses, but this option does not exist for IPv6. Microsoft has made it clear that it is stateless address auto configuration or nothing.
After all this is done you go to the source domain controller virtual machine you have just prepped and delete all of its snapshots. You must then power down the virtual machine and export it. Import it onto the target host and voila: a Microsoft domain controller clone.
My ideal clone would work a lot like good old Symantec Ghost from way back in the beforetime. Indeed, this functionality is built right into most virtualisation management tools today. You point it at the source system, go through a cute little wizard that usually boils down to "where do you want the clone to live" and a duplicate is created.
If you want to get really ambitious you can have your cloning wizard do basic customisation when injecting the new virtual machine: things like system name, IP address and so forth.
But it is not really a cloning process so much as a hazing and initiation ritual when you have to explicitly add applications and services you want to keep around (as in not be stripped out by the cloning process).
It reminds me of the old SIF files used to preconfigure the Windows XP installer – with perhaps a splash of virtualisation template – without quite making the whole leap.
Two additional issues also move this out of the realm of what I would also call a traditional clone. The first is that the clones cannot be integrated into the Active Directory structure if the original source virtual machine has been removed or demoted.
That is not behavior I would associate with a clone so much as a "golden master" style pseudo replica. The killer is that Microsoft has dire warnings about importing templates older than the tombstone/deleted object lifetime which has a default of 60 days.
I rebuild my domain controllers once every two or three years; 60 days isn't long enough for much of anything that I would do in day-to-day administration.
VDC to the rescue
Finding a use case for the disaster recovery portion of VDCs is unnecessary. Over time, the use cases will find you. The clone functionality sees the light of day primarily as a short-term deployment tool.
If you are setting up a new network – or expanding an old one – and need to deploy fairly large numbers of new domain controllers, then the VDC cloning is the right tool for the job.
If, like me, you add domain controllers only every other year (but are too lazy to build them from scratch except during major overhauls) the old-school "demote the source domain controller, dupe it and the re-promote the source and any descendants" is still the best option.
Another area where the cloning could potentially be very useful is in tight-circle DevOps environments typically found in a cloud development. Be it a public or private cloud, deployment horizons of new versions of a service tend to be fairly short in these circles.
In this scenario, the concept of regenerating your clone every few months is not that much of a bother and the forced inclusion of applications and services you want to keep is a feature, not a bug. DevOps teams rapidly iterate layers of test environments and ultimately production ones in a manner that would benefit from the details of this cloning process.
I'd love a multi-year shelf-life version of the clone that had a "go away Microsoft I'm cloning this because it is exactly the way I want it, I just need a copy with a different name and IP" button during the creation process.
Even if you upgrade nothing else to Server 2012, the virtualisation-aware featureset is hard to ignore.
Regardless of any beefs about cloning, the VM-Generation ID–based disaster recovery elements of the new VDC features mean that you would have to spend more time looking for reasons not to upgrade your domain controllers than building the case for doing so. ®