Virtualising your infrastructure? Get your numbers right
Avoid nightmares later
Virtualisation can be a powerful tool for your IT department, making your infrastructure far more efficient. But without proper planning it is easy to trip yourself up by not scaling the system properly.
How can you plan the capacity needed for a virtualised system so that you don’t end up overspending or under-resourcing? Here are some key points to consider.
One of the biggest mistakes companies can make is trying to run consolidated systems over non-consolidated networks. Discrete data centre networks for specific components such as storage and servers are no longer appropriate.
As companies move to private cloud computing, in which systems are consolidated onto fewer devices, networks must be simpler to manage while supporting large volumes of traffic.
Converged networks, with storage and server traffic running across a unified fabric, form the basis for next-generation data centres.
The converged network protocol wars are largely settling down and it is now possible to buy vendor-agnostic 10Gbit connections that will support your servers and storage well enough to help you avoid problems such as bootstorms.
Managing virtual systems over a single network fabric can make life far easier for administrators, decreasing management costs while maintaining system uptime.
Before you can scale up a pilot it helps to understand what your current system is doing. Baselining system activity using management tools is a crucial step, but there are different levels of knowledge.
“Nine out of ten times your customers think they have the numbers, but it’s just the apps installed on a workstation they are looking at and not so much the utilisation of those apps,” warns Erwin Vollering, service director for virtualisation at Glasshouse Technologies.
Measuring that, with network probes or client-side agents, will give you an idea of what to expect when your desktop virtualisation system goes from 15 to 1,500 machines.
Don’t forget to sample your system to get a sense of how spiky your computing demand is. Then you can plan to hit a percentage of your peak load.
Talk to vendors
When planning for a scaled system, it makes sense to get a comprehensive picture of your hardware’s capability. Take as much information as possible from your storage vendors about how their systems will perform at certain loads.
In an application-centric world, you must be sure to factor in the software too. How capable is the software of working at scale? Is your relational database going to cut it, or do you need to move to a large, multi-node database?
In an age of Big Data, that becomes a critical decision. You may need to completely rethink the way you deliver an application, especially if it is a legacy app designed for a different infrastructure.
Consolidated virtualised systems work far more effectively when all of your hardware is integrated effectively.
The next-generation data centre features a unified network fabric supporting an integrated stack, in which storage, servers, and networking equipment are all optimised to work together.
This provides better performance and also makes it easier for systems management tools to get an end-to-end view of the hardware infrastructure, and therefore of the software running on it. Consider a next-generation data centre architecture to keep consolidated systems running more smoothly.
Points of failure
Systems may be designed based on an expectation of a specific number of transactions in a given time window.
When a system’s transaction volume scales, single points of failure become a real issue. If a switch fails in your pilot system, the failover switch may be able to take the strain. But when you scale by an order of magnitude, will that still be the case?
Designing multiple failure zones that are segmented in different parts of the system is a good idea, so that no part of the infrastructure can be compromised at any one time.
Ideally, you need to also design those systems to cope with the system’s entire peak load so that they can pick up the slack in the event of a failure.
Amazon had to truck in more equipment to increase capacity
This can be difficult as the scale of the system increases. One reason that Amazon’s Dublin-based Elastic Block Store system took so long to get back online recently was that so many servers went down at once because of a transformer failure.
The ones that were left kept trying to replicate to anything they could find, which led to customers’ volumes being “stuck”. The firm had to truck in more equipment to increase capacity.
“There were delays as it was night time in Dublin and the logistics of trucking required mobilising transportation some distance from the data centre,” the firm said.
“Plan and test a restart,” says John Soanes, head of architecture at IT managed services firm Adapt.
“Because you have contention, you have successive failures. So it’s really important to try and bring up a large system in a controlled way.”
The axiom “garbage in, garbage out” pervades every aspect of IT, from coding through to systems design. It applies to virtualisation, too.
Get quality information at the start to avoid problems later. And consider designing everything from the bottom up as a converged architecture to increase performance and simplify management. ®