How do you manage service levels in a virtualised environment?
Business as usual or whole new ball game?
Lab In previous research projects we’ve examined the impact of service level monitoring and management, and found positive benefits in having a range of associated SLAs in place. No surprises there of course.
However, the main finding was that beyond a certain number, it didn’t matter how much ‘extra agreement’ you have: there was a diminishing return in terms of positive perception of IT from the business’ point of view. In other words, the harder you try to deliver, the harder things get.
It’s going to be interesting to see what will be the impact of virtualisation on this very real law of diminishing returns. In theory, a major benefit of server virtualisation is to enable an improvement in service levels above and beyond what can be achieved without it. Now that many organisations are taking their first steps into ‘mainstream’ virtualisation (i.e., beyond pilots and small scale initiatives), we’re starting to find out just how the relationship between virtualisation and service level management plays out.
What changes in thinking does virtualisation bring? There are a couple of major differences from IT’s point of view, notably around server provisioning: IT becomes able (in principle) to effect changes almost instantly compared to the pre-virtualisation era, when timelines for responding to new requirements were measured in weeks, not hours or minutes.
However, another factor is that despite it being possible, and indeed desirable, to manage a virtual environment without any reference to the physical world, adopters are discovering just how important it is to understand the virtual-physical divide – because there are still physical servers involved despite application logic being executed in virtual machines, and not least, because users still see ‘their’ applications as discrete entities, regardless of whether or not they exist in a virtual environment.
However, are things really so different in practice with virtualisation in the mix when it comes to managing service levels?
There are, of course, a bunch of things that could make a difference.
A potential biggie is the architecture in play itself. We’re used to employing certain configurations to deliver pre-designated levels of scalability, performance and security in the physical world. Load balancing across multiple servers for example, or “2N+1” failover models, or database clustering, or defense in depth – all of these models rely on physical server configurations which don’t have a direct virtual equivalent. It doesn’t necessarily make sense to load-balance across multiple virtual servers if they are all going to be running on the same physical server, for example.
The relationship between provisioning, procurement and service management may also be affected. In old money, it took a while to get new equipment in place – and corporate consumers of IT have been brought up on the principle of lead time. Even if equipment was available, it would still take days or weeks (or even months) to configure and deploy.
Virtualisation does indeed make such things much simpler – but the knock-on effect could so easily be that such best practices as asset management, configuration management and license management get lost along the way. Indeed, it remains to be seen whether the current ‘gold standards’ of IT management best practice – ITIL and COBIT – will cut it in the virtual world.
This brings us to the manner in which we can or could operate an environment containing a blend of physical and virtual domains. The monitoring, management and reporting activities which worked perfectly well in a more static, physical environment may simply not cut it as virtualisation becomes more widespread across the IT infrastructure. The question is, what should you do about it – or for those of you that have been there and done it, what did you do about it?
Something we’re very interested to hear about is how the emphases you place on these areas vary depending on what sort of company you are and how big your IT environment is. For example, if you are part of a fully resourced IT shop in a larger organisation you may have the luxury of being able to ‘over-provision’ your virtual environment in order to make sure that ‘the pool’ can withstand the demands placed on it.
Alternatively, someone working in a smaller IT shop with little or no margin may see virtualisation as way of squeezing every last drop of goodness from the IT resources they have, or conversely, see the additional micro management that may be required as an overhead they could do without. Whatever your situation, we’d love to hear. ®
It's so costly when you're the little guy...
Over-provisioning in smaller organisations is not only possible, but a necessity. When working with virtualisation, you still need spare hardware in case a node goes down.
Let me run you through an example, using the infrastructure of the company I work for. We have two types of network. One at each of our production sites, and one at our head office. Head office contains all the usual centralizable things. The production sites contain servers and services that simply can’t be centralised. (No surprises here.)
Our head office manages to cram all of its services into X physical servers. (Somewhere around X*25 VMs on those X servers, but only X physical boxen are active at a time.) For this, we keep a “cold spare” copy of our fastest VM server sitting around on the rack. If something goes boom, we move the VMs over to that server. We also have over-provisioned space on the existing servers so that if a second server should fail, we could absorb the hit by spreading the VMs of the second failed server across the cluster.
Our production sites can fit everything they need onto Y physical servers. To take advantage of a little extra performance that we don’t strictly *require,* but is nice to have, we spread the load to Y*2 physical boxes. Like the head office, we keep a spare around. Again, the spare swaps in for the first failure, and in a pinch we can collapse our sites from Y*2 active physical servers into Y.
Is it ideal? No. But we can’t really afford to be hosting somewhere in the realm of 25 times the number of physical systems in our various micro-datacenters either. So virtualisation was the only option for us. We do not have “big boy” budget. Everything we do is whiteboxed, and we are running ESXi (for lack of funding to purchase VMWare’s management tools.) When we deployed our VM infrastructure, buying SANs of adequate speed (10GB iSCSI or fibrechannel) for head office and all production sites was simply not an option. (Thus we use local storage on the physical server nodes.) This means that yes, if a server goes boom we have to move everything by hand. As a small business this means up to 4 hours downtime (worst case) with an average of 45 minutes any time a physical server vomits up a stick of RAM of drops a disk.
Counting both spares and active systems, we’ve around $N worth of virtualisation server hardware. Rough math says that if we wanted to get all the VMWare management software to run our gear, we’d be asked to give VMWare north of ($N + 30K). (That doesn’t mean $30K in addition to the cost of the hardware. It means the VMWare management software would be $30K more than the cost of the hardware it would be managing!)
45 minutes of downtime every now and again, as well as the cost of a few spare chunks of hardware and a little over-provisioning is something we can live with. The costs of the software to “elegantly” solve virtualisation provisioning issues isn’t.
Totally agree: virtualise the dev and functional test servers but keep the production and performance test servers physical except possibly for non-critical or marginal apps.
Same old: virtualisation is not a silver bullet but an extra tool in the box.
How many nines would sir like after the decimal point?
How do you manage service levels...?
Simple, you just lie.
Anyone who's used shared services knows that the smiles and promises last only until you sign.