Let's play immutable infrastructure! A game where 'crash and burn' works both ways
Leavin' it to the Netflixes... for now
If you’ve ever had the misfortune to work as a systems administrator (and it doesn’t matter if it’s a Windows or Linux shop) you’ll know the feeling of logging on on Monday morning, checking a few log files and noticing something’s not quite right.
It might be file systems filling up, a spam attack has filled the log directories or there is an important OS system update to be performed. You’re a good sysadmin and fix the problems there and then, everybody’s happy, but unwittingly you’ve created a major problem.
You’re managing dozens of machines: the majority are mostly the same, running the same OS, possibly the same applications (load balanced web servers, for instance) and you’ve just updated one, but not the others. Your infrastructure drifts, servers are slightly different versions, applications not quite the same and one day during an update it all comes crashing down.
OK, it might not be that bad, but you might have the nightmare of having slightly different procedures to update on each machine or worse update scripts that fail in some cases. There might be one way to stop the problem happening, don’t let anyone log on to a server once it’s been set up. This is immutable infrastructure - it never changes.
If a sysadmin can log on to a machine, the temptation to change setting (and perhaps fill in the change log forms later, if you have time) can be just too much to bear.
Immutable infrastructure stops this – machines you don’t alter just dump and replace with new machines. The servers you need to create are built from scripts - something most likely to happen with cloud-based servers but - with careful installation procedures - something you can do locally. Immutable infrastructure is a concept that has been hotly debated for some time.
The last thing your script does is turn off the SSH port (or whatever method your OS uses to let you logon) so you can’t log-on to the machine anymore. Your machines will run like this until they are therefore destroyed and replaced by new a script-based set of machines.
Of course all these scripts will be under source control, if needs be they can be rolled back to a previous state, the point is all of them will always be the same configuration, the data will be safe on another disk (backed up of course!) and in house application software deployed by continuous deployment.
So now our infrastructure really is cattle and not pets, the sysadmin probably won’t know the names of most of the machines that make up the application, it will all be code numbers and sub version numbers. Old timers like me will lament the passing of servers named after planets in Star Trek, but such is the price of progress.
Sharp readers will have noticed that there is a price to pay for this, if your infrastructure is code based then it will need testing before it is deployed. That's going to mean it will need test infrastructure, test scripts to make sure it works when it’s brought up and a testing schedule to ensure all tests are covered and correct.
However, this isn’t like pure software that can generally be tested quickly; deploying a new server takes time, even for the fastest cloud based operators and that delay will be frustrating! Here’s an example: your services are under attack from some nasty hackers, there’s a OS fix to counter the attack, in the old days your admin team would deploy the fix across the server range, embarrassment avoided. Now you need to go to the test environment, test the fix, and then redeploy. Time to fix has stretched.
The delay in fixing problems might not be the only problem, it’s fairly well accepted that the biggest problems with code are introduced during programming (even if they may be following requirements!).
Now that our infrastructure is code, deploying new servers can potentially be susceptible to the same problem, without that through testing phase the infrastructure scripts might be buggy. Worse still, what if there is a scripter with a grudge, bugs in the infrastructure could be nasty, destroying data disks is just one example to bear in mind.
It’s worth bearing in mind that the immutable dream isn’t all there yet, companies have been working with Docker, AWS, Azure to make it happen, but there is no simple and cheap off-the-shelf solution just yet - look at how much code Netflix has built (and open sourced) to make their platform work.
Immutability certainty works for the big boys, those with global operations, but for small and medium enterprises, it’s almost certainly a step too far - for now. ®
Want to learn more about DevOps, Continuous Delivery, and Agile? Head to our Continuous Lifecycle Conference from May 3-5. Full details here.