Steer clear of the desktop virtualisation bootstorm
Prepare to form an orderly queue
It is every IT administrator’s worst nightmare.
All the employees’ desktops have been virtualised and are running on a server. The pilot project worked well and everyone was happy, but then the team tried to scale it up and now it’s Monday morning and 3,000 users have just walked in with their lattes and croissants, sat down at their shiny thin-client machines and tried to log on.
The system grinds to a halt and people are waiting 20 minutes or more to access the system. Productivity is falling and in about 30 seconds the helpdesk will be flooded with angry calls. It's what is known as a bootstorm.
It isn’t just underpowered servers that can cause bootstorms. They can be due to I/O bottlenecks in two areas: storage, and the network. Disks in a storage area network can spin only so fast, and when too many people are trying to access their machines all at once it is not fast enough.
Even if the disks can get the data off quickly enough it still has to get to the server. That can cause problems if the network connection between the SAN and the server is too slow.
What to do?
Ross Bentley, head of professional services at consulting firm Assist, suggests configuring the network to stagger virtual desktop bootups.
“You can get to the point where you know the whole environment can work by 50 turning virtual machines on at the same time,” he says.
Doing it in chunks of 50 could get all the users up in a few tens of minutes, although it would mean starting before the users get into work.
George Crump, chief steward at analyst firm Storage Switzerland, disagrees with the solution. For one thing, bootstorms are not limited to the morning login.
“For example a virus scan might kick off at the same time or a big patch update. In any case, you can’t pre-log them in because you violate all kinds of security,” he says.
One approach is to use a single shared image, rather than replicating a new machine for every user, in a Remote Desktop Services scenario. You could then use folder redirection to access users’ personal data. This drastically reduces the number of separate images that are accessed on the disk.
“The other thing about a golden image is that it makes it more affordable to move that into solid state storage,” says Crump.
Removing the mechanical element of storage speeds up access. You might not want to store 3,000 virtual machines on expensive SSDs but a single image would work.
Administrators also have another element to to deal with: the network. A poorly-configured network that can’t transport data from virtual machines to the server fast enough will cripple performance.
Moving to fibre channel for the SAN interface speeds things up, but that requires a different set of administrative skills – not to mention a whole new set of host bus adaptors – which ratchets up cost.
Another option is to use direct-attached storage to ease the bottleneck. However, this doesn’t give you a free pass, according to Sylvester de Koster, group technical manager at distributor CDG UK.
“Even with direct storage, if you don’t configure it right CPU or disk or memory will be overused, and that causes major issues,” he says.
Understanding the workers
Hamish Macarthur, founder of storage analyst Macarthur Stroud, points out that simply configuring storage and network links for high performance is not enough. You have to understand the types of users you have and their working patterns.
“There might be issues when people are in different time zones and that might flatten things out,” he says. “You need management tools to recognise the spikiness of the load.”
Baselining storage and network demand is therefore a crucial element of the desktop virtualisation process, and that includes knowing how much inventory you have.
“It’s a trial-by-error process”
“You might believe that you’re providing PCs or desktops for 1,000 users, but there might be another 500 to 600 out there connected via supplier, or maybe a customer that ties in a bit more closely," MacArthur says.
Balancing machine density is important to prevent port I/O and CPU bottlenecks, but there is no easy formula to balance the ratio between physical boxes and virtual machines. “It’s a trial-by-error process,” says Crump.
Part of the challenge of desktop virtualisation is maintaining the user experience – and employees allow little margin for error.
Avoiding bootstorms entails both hidden costs and a high degree of configuration expertise. Are you ready for that? ®