It's time for Microsoft to revisit dated defaults
Among other things, Active Directory needs an overhaul
Sysadmin blog What works for 100 users frequently doesn't work for 10,000. The same is true in reverse, however, there are far fewer vendors worrying about tailoring software designed for the enterprise to the needs of the SMB. True mass market software needs to walk the tightrope between both worlds, and very little of it succeeds.
Let's consider Active Directory (AD) replication times as an example. By default, AD is scheduled to do inter-site replication every 180 minutes (three hours). This makes sense if your AD is enormous and one or more of your sites happens to live on the other end of connectivity from the past. An ISDN line, for example. Or perhaps a telegraph.
This value can be changed from the default to occur as frequently as once every 15 minutes. Here again, 15 minutes might be rational if your AD is some multi-tentacle hydra with fleventy-five domains in multiple forests bound together into a complicated and incomprehensible circus of insanity. Fortunately, most of us don't have to play that game, and thus 15 minutes represents quite a conservative minimum replication interval.
Indeed, for many multi-site businesses, 15 minute replication windows are increasingly an unacceptably long choice. The default 180 minute delays are simply absurd. Technology is in constant motion, and recent changes mean its time for Microsoft to revisit many of its choices.
DNS is everything
The biggest issue with AD replication times is that AD integrated DNS zones have to wait on the rest of the AD in order to replicate. This was fine back in 2000, when dynamicity of networks was an extreme novelty, but today's new technologies are all designed to be highly dynamic and are absolutely reliant on DNS.
The problem with DNS and AD is twofold. First: Microsoft's DNS servers are extremely user friendly, battle tested and reliable. More importantly, they integrate with other key infrastructure elements, most notably DHCP. Levy whatever venom against Microsoft you wish, its DNS servers are an excellent choice for the role, the result of which is a lot of critical infrastructure relies on them.
The second part of this is that increased adoption of IPv6, microservices, load balancers and so forth are driving a DNS-dependant dynamic infrastructure. Sysadmins aren't the only ones creating workloads these days. Developers and end users might be doing it, and even the machines are spinning up workloads on their own!
The velocity of change
Putting aside the DNS issues – as one can always use non-AD-integrated DNS solutions if needed – there are other reasons 15 minute AD replication times are an annoyance. The first is imply that of the time it takes changes made by systems administrators to show up. Yes, we can go into AD sites and services and manually trigger replication, but that's a pain.
Additionally, the combination of self-service interfaces and hybrid cloud solutions means that systems administrators increasingly have no idea what's occurring on their networks. We design the networks, but we don't necessarily know every time a user has added a new device, or from where.
One particularly bothersome example is that of a marketing executive who purchased a new notebook and registered it against the company's network using the provided cloud-based mobile device management service, as she had been instructed. The device was added, it began to receive emails, but her attempts to log in to the network failed.
The reason was that she had activated the notebook from the one of the company's smaller sites. The cloud service synchronized against the head office's AD, but didn't replicate the changes out to the site the marketing head was physically located in until 15 minutes later.
The result? Some frantic phone calls to a help desk professional who had no idea what could cause this (devices are typically activated at head office), which resulted in the marketing exec's user getting locked out. The marketing exec's shiny new notebook was thus not working in time for a major customer presentation and words were exchanged with IT. Loudly.
Fortunately, despite the GUI, the PowerShell commands and the official guidance all saying replication can only be set as frequently as every 15 minutes, there is a workaround. The trick is to enable Inter-site Change Notification for the relevant links.
Inter-site Change Notification essentially causes Active Directory to treat replication between AD servers located across site links as though they were in the same site. When a change is made it is immediately pushed across. (Actually, it takes between three and five seconds, which can make a difference for some high-churn applications, so be warned!)
The problem with the solution
The problem with the solution is that in solving some very real problems (like DNS), we create others. The prominent example is: as goes AD replication, so goes GPO dissemination. Even experienced systems administrators have been known to kill entire networks with a bad GPO. The idea of every GPO change made instantly replicating throughout the AD fabric is pretty scary.
This issue isn't going to go away. It's only going to get more pressing. What's needed isn't simply a GUI toggle to enable Inter-site Change Notification, but a fundamental change in how AD behaves.
Today, AD is (mostly) an all-or-nothing affair. When AD replicates, it all replicates. (There are some exceptions, such as lockouts.) This needs to change.
Overall, individual applications or groups of applications need to be able to set different replication times. DNS using the AD infrastructure is grand, but it would be groovy if it could replicate asynchronously of GPOs, for example. It would also be useful to specify the order in which services replicate, if they replicate together. DNS replicating ahead of GPOs, for example, helps to solve a lot of problems.
Imagine if your hybrid cloud infrastructure could say "replicate this information throughout the AD fabric immediately, because it is a user/device registration" without triggering a fabric-wide reconvergence.
This is a major undertaking, and it's an open question whether or not Microsoft is interested in solving the problem. Traditionally, Microsoft only seems to engage with its customers when absolutely required, and even then only if a large enough customer makes a big enough noise. I don't yet see any large customers hollering about this.
Windows Server is oddly like looking at any jurisdiction's laws. It's a curious combination of product rigidity and seemingly bizarre default values that make no sense for the majority of individuals or businesses in the present day. Sadly, getting things changed may be just as hard as getting politicians to crack open the books and put out-of-date laws to bed. ®