Is server virtualization ready for production?
Beyond the low hanging fruit
Lab The adoption of server virtualization technology follows several trajectories. We could consider the breadth of its penetration in terms of the number of organisations using it today, or we could consider the depth of its use in individual organisations, in terms of what they actually do with it.
The latter is of more relevance to IT departments that have already taken first steps down the server virtualization path. For other organizations, if we think beyond the workloads already being run in a virtualized environment, is there a ‘next’. And if there is, what is it?
Perhaps this is simplistic. When you think about virtualization do you think in terms of a proportion of the x86 server estate in your datacenter that ‘could’ be virtualized or do you think about the different types of workloads that need executing? Vendors with a vested interest in shifting virtualization technology tend to presume that something that could be a candidate for virtualization, automatically will be.
However, we know from Reg reader research that decision-making is typically focused on a simpler criterion: can it save money right now? As a result, virtualization tends to be employed for the more straightforward workloads that can be easily consolidated.
Admittedly, the notion of ‘straight forward’ is relative, although there are some commonly accepted candidates such as print servers, web servers and the like. Whether these are chosen because they are seen as cost-saving, low risk, ‘non-core’ or ‘non-critical’ areas, it’s where most organizations cut their teeth. So where do we go from here? The answer has to be into areas of higher potential risk, and less evident cost-benefit. So then: what is the rationale for making decisions?
Work up the list
To reiterate, the factors at play are: cost savings; virtualization benefit; business importance; and migration of risk. Does IT simply ‘work up the list’ from least risk / importance? Or are those with prior experience now applying virtualization to areas which would benefit specifically from it, regardless of their importance to the business?
Factors around migration-risk bring into question enough experience and confidence exists in the technology itself and on the periphery (availability, resilience and back-up and recovery systems), as well as the skills of the IT department itself to be able to consider higher-risk workloads as virtualization candidates.
One must also take into consideration the socio-political aspects of IT ownership. A line of business leader might have concerns about ‘his’ application running in a virtualized environment, even if he's perfectly happy with the service he gets from ‘lower value’ services. But if the technology is proven elsewhere, what’s the fuss?
Part of the answer could lie in how big the first step down the virtualization route was. Did the IT department have to fight to make it to happen, or did someone in the business make a request for it directly or indirectly – e.g., a demand that could only be fulfilled by employing technology in this way?
One argument suggests that had it not been for the economic crisis in 2008, many organizations would not have felt it necessary to virtualise any server infrastructure.
So, if you have moved virtualization beyond the pilot, how did you decide? Was it via the same process you employed the first time you decided to take advantage of this technology? Did it involve a complex risk management exercise or was it more about gut feel and trust in your collective abilities and the technology itself?
‘If you’ve already taken the ‘next step’, or are thinking about it, we’d like to hear about the decision-making processes you or your department have been working through and your experiences of migrating ‘next level’ workloads into the virtual environment so far.
Who ever gets to decide these sorts of things?
Necessity is the mother of invention. We had to go virtual on the servers due to space and power constraints. Our existing app software simply couldn't fully utilise a server, and the devs were completely unhelpful in bringing their code up to speed.
On the VDI side, we had a line-of-business app on which we were entirely dependant that had some unique constraints. The "polling" it had embedded to allow servers operating on multiple physical sites was absolute shite, and the devs could never get it to work. Their solution was always "use terminal services to serve the application out." The problem with that approach was that is one user managed to tank the app, it would tank the app in every session for every user on the terminal server.
This left us no route but to deploy VDI, install the app in every user's VM, and tie them all back to a "single site" server.
So...choice? Not really. Virtualisation was quite simply the only route we had available. Thankfully, it's proven itself a godsend. Well, okay, let me qualify that. Once we abandoned MS Virtual Server 2005, and VMWare Server 2 and moved every single one of our servers over to ESXi 4, it was fantabulous. As the old saying goes…never buy version 1 of anything. (In fact wait a few versions in for someone else to take the bullets on crappy apps for you.)
Ready for production? We couldn’t live without it.
We've taken a few steps in this direction...
My workplace has a number of 'small' services running on virtual servers, as one might expect (certificate server, BES, and an interface monitor), but we just migrated one of our production servers from a physical environment to a virtual environment. It's running on 'dedicated' hardware still (i.e., it has an ESX server all to itself), but it's still a step in this direction. The decision was made largely by the IT department, and both the business and the vendor signed off on it.
My first real production virtualization deployment was back in mid 2004 I believe, using VMware GSX I think v3.0 at the time(now called VMware server).
The deployment was an emergency decision that followed a failed software upgrade to a cluster of real production servers that was shared by many customers. The upgrade was supposed to add support for a new customer that was launching within the week(they had already started a TV advertising campaign). Every attempt was made to make the real deployment work but there were critical bugs and it had to get rolled back, after staying up all night working on it people started asking what we were going to do next.
One idea(forgot who maybe it was me) was to build a new server with vmware and transfer the QA VM images to it(1 tomcat web server, 1 BEA weblogic app server, 1 win2k SQL/IIS server, the main DB was on Oracle and we used another schema for that cluster on our existing DB) and use it for production, that would be the fastest turnaround to get something working. The expected load was supposed to be really low so we went forward. I spent what felt like 60 of the next 72 hours getting the systems ready and tested over the weekend with some QA help, and we launched on schedule on the following monday.
Why VMs and not real servers? Well we already had the VM images, and we were really short on physical servers, at least good ones anyways. Back then building a new server from scratch was a fairly painful process, though not as painful as integrating a brand new enviornment. What would usually take weeks of testing we pulled off in a couple of days. I remember one of the tough/last issues to track down was a portion of the application failing due to a missing entry in /etc/hosts (a new portion of functionality that not many were aware of).
The system was slammed with more traffic the first day then we were expecting for the entire month, so it really had trouble keeping up, we built a couple more web servers to help handle the load as fast as we could and it stayed in that configuration for some time, several months at least. Running (tens of?) thousands of $ worth of e-commerce traffic/day over a single dual processor system running GSX on top of 32-bit linux with 16GB of ram and internal disks.
Really saved the company's ass. And it laid the groundwork for future clusters as well - they moved away(mostly) from the shared model to dedicated clusters for the larger customers so that if one customer's code was bad it could be rolled back and not impact other customers.
So hell yeah it's ready, I still think it requires some intelligence on the part of the people deploying it, you can get really poor results if you do the wrong things(which are by no means obvious). But the same is true for pretty much any complex piece of software.