Titanfall, shoot-'em-up gamers, cloudy contracts and cattle
Efficiency by numbers: It's wonderful, it's horrible and it is the future
Sysadmin blog A "servers are cattle, not pets", DevOps-style approach is the only feasible way for a small number of people to run modern cloud-scale data centres.
Small teams running large server farms are necessary to grind down costs so as to keep up with the Amazon-fearing race to the bottom. But even when cloud computing works exactly as designed, I am increasingly convinced these management styles are not always working as desired.
The real world impact of the "cattle, not pets" approach to server management struck me during a conversation with co-worker Josh Folland. He's a professional video gamer working on a review of Titanfall. When I asked him his opinion on the subject, I was surprised by his answer.
While he would easily rate the engine and gameplay as among the top 5 FPS games of all time, he spent a solid hour cursing Microsoft's Azure cloud, and the concepts behind cloud computing in general. His rationale hit surprisingly close to home.
Efficiency by the numbers
The core of Josh's complaint is thus: cloud servers are all about meeting quotas and SLAs; they don't provide the same quality of play that fleets of community-managed dedicated servers once did.
In the old days, community members would band together into "clans" and each "clan" would rent a dedicated physical server at a colocation facility to run the game server for their favourite FPS.
These servers might be restricted to clan members or opened to the public with a given amount of "slots" reserved. If you weren't a clan member and a clan member wanted on, you got bumped.
Clan servers had all the downsides of anything community-managed: petty infighting, lack of maintenance, absurd rules, mods, cheating, red tape and more. They were very much pets, not cattle; each server was babied by the clan administrator, tweaked and optimised until no two servers were alike.
Don't be kept prisoner by lag
Network lag is a great measure of how pets are better than cattle. In an FPS, lag is very, very bad. When you see the image of the enemy soldier on your screen, you aim at it and click on it. Doing so sends a packet to the server which then calculates whether or not you hit the fellow. The enemy troops, meanwhile, are moving around and probably shooting at you.
If your target has a ping of 35msec and you have a ping of 350msec then he can execute 10 actions for every one of yours. He will be able to quite literally dodge bullets and you won't see his bullets coming. In the modern FPS world when two gamers of more or less equal skill meet, the gamer with the lowest ping wins.
For all their faults, community-managed clan servers simply didn't have this problem. If you wanted to play with your buddies on a regular basis, you could all get a fast server with great connectivity located geographically near you. This would ensure that you all had low pings and experienced minimal lag when playing.
Titanfall doesn't work that way. Titanfall's multiplayer servers are a Microsoft Azure cloud-based affair. Players are assigned to a server based on an algorithm and the algorithm only really cares about meeting SLAs.
"Make sure that only X per cent of players per server have a ping higher than Y" or "make sure that CPU usage is below Z" – otherwise spawn new instances and load balance incoming players. It sounds good on paper, but in practice, it is less so.
Getting a sub-50 ping from here in Edmonton, Canada is rare, and the difference between 250msec pings and those below 100msec is entirely the luck of the draw. For hardcore gamers it turns what could have been one of the best games of skill yet created into a game of chance.
People on a string
The "cattle, not pets" thinking that goes into designing a modern cloud application is – to me at least – the issue that needs addressing. In setting a series of scripted limits, we are really saying "we are okay with annoying this many people".
If the people running Titanfall are anything like the hundreds of other SaaS companies I've interviewed then they are frustrated by discussions like the above. In my experience most SaaS developers believe strongly in the purported benefits of cloud computing and they simply don't understand why end users are frustrated.
From a DevOps point of view, the solution is simple: they can tighten their SLAs, set the limits differently and tweak the matchmaking algorithm. This would surely ensure a better experience for all. The infrastructure is dynamic and scalable, so once the perfect algorithms are found, then the thing will basically run itself and all these silly complaints will simply go away.
Unfortunately, this will never solve the real problem. People don't want to be equal. We don't like "the luck of the draw." We want the option to be in control, even if we never exercise it.
The casual gamer is a different animal from the fellow dumping four hours a night into this game with his buddies. Someone catching a round a week on his Intel Graphics-powered notebook doesn't have the same expectations of experience as the individual with the dual-Xeon rig sporting four video cards and a 240hz 56" LCD.
In much the same fashion, cloud storage is a wonderful solution for a certain category of customer, but a terrible plan for many others. Companies turning over terabytes of data every day but who can't get better than an overworked ADSL connection to work with certainly aren't going to be using it any time soon.
In this brave new cloudy world we are all homogenous. Just as servers are "cattle, not pets", we have become "numbers, not people". Perhaps the market opportunity for companies to differentiate themselves lies not in being more clever about the creation of automated scalable infrastructures based on cold numbers and unfeeling logic. It may just lie in those companies who swing the pendulum back towards catering to the individual customer once more. ®