Everything you always wanted to know about VDI but were afraid to ask (no, it's not an STD)
All you need to make virtual desktops go
Feature So you know your way around a data center just fine, but you've been told to roll out VDI – aka that highly riveting technology, virtual desktop infrastructure. This is your first time juggling virtualization on this scale. What do you need to worry about? How will you test and benchmark it?
Our sysadmin blogger Trevor Pott presents this handy cut-out-and-keep guide, complete with four scenarios for you to work with.
In any VDI setup there are four major elements that will determine the performance and user-friendliness of your system: the graphics; your network bandwidth; your servers' resources; and the storage of all that data. If any one element is out of balance with the rest, your virtual desktop beast will fail the user.
As well as these elements, VDI has three basic flavors in the Windows world: Remote Desktop Services (RDS, formerly known as Terminal Services); Windows Server VDI; and Windows Client VDI. In the latter two approaches, each user gets one virtual machine all to themselves: one user per operating system instance.
In an RDS scenario, multiple users log onto a single Windows Server instance and share that installation. Multiple RDS servers can exist in the VDI pool, but the differentiator with RDS is that concept of multiple users logged on to a single copy of the operating system.
This multiuser concept has real-world significance. An application installed on the RDS server is available to all users and must be licensed as such. An anti-malware sweep against the OS will affect all users, and it is possible for one heavy user to degrade the experience for all other users. (Though Server 2012 and Server 2012 R2 do have quite a few nice new technologies to thwart processor hogs.)
Perhaps most critically, you may have applications that are not exactly multiuser friendly. One user crashing an application can break it for all other users, and if someone manages to crater the OS, they take it out for everyone.
The alternative "one user per operating system" model employed by Windows Server VDI and Windows Client VDI uses more space and resources per user, but eliminates the possibility of one person ruining it for the rest.
Unfortunately, licensing Windows for the aforementioned three basic types of VDI differs substantially from flavor to flavor, and you should investigate which approach is right for you. Since today we're discussing the engineering side of implementing VDI, this article won't go into the nitty-gritty of licensing (although this PDF should be your first port of call). Instead, we'll roll onto the elements that make up a decent VDI stack.
Graphics: Don't get squeamish about the graphic detail
Generating on-screen graphics is most often the single biggest drain on any VDI setup. It is also the most frequently overlooked consideration and the least understood by those new to the field.
Everyone should know that the general-purpose processors in PCs offload graphics work to dedicated GPUs: the most basic graphics processors in today's desktops can rapidly render 2D and 3D scenes on large (or multiple) monitors.
This spares the general-purpose CPUs from having to juggle graphics plotting and the compute workload of your daily dozen, thus maximizing performance all round.
Without GPU virtualization, VDI deployments must not only handle the usual CPU demands of applications, they must create virtual GPUs out of CPU resources for each VDI instance you provision. Even the most powerful x86 servers can be quickly felled by a handful of VDI users streaming video from their virtual machines, something that happens more often than you may think.
The number one element that is going to affect your score in VDI synthetic benchmarks is graphics generation. Even if every element of your deployment is well balanced, if you are relying on your server CPUs to provide graphics for your VDI instances, your score is going to be pretty bad.
This doesn't mean that the real-world results will be bad. This is where understanding your users and their workload comes in. Are your users going to be doing a bunch of word-processing and spreadsheet work with no video, animations, and so forth? If yes, you can get away without the GPU virtualisation. Are your users going to be doing a lot of image-heavy work or watching training videos?
Bandwidth: Feel the thickness
You can have the greatest server farm in the world, perfectly specified to provide a beautiful user experience, but it all means nothing if you can't get those screens to users. If everyone is wired up with gigabit Ethernet, you're probably fine. You aren't, however, going to drag a 1080p video experience over a 3G mobile connection.
You also need to bear in mind that VDI almost always results in a change in usage patterns. Whatever your usage patterns are today, expect that VDI deployments will ultimately see more people working remotely, be that telecommuting from home or pulling down their desktop at a hotel or business meeting. You need enough WAN bandwidth to meet not just today's needs, but tomorrow's.
Consider also that by deploying VDI you may be changing the access patterns of your entire data center. Instead of a great deal of "north-south" network traffic your traffic is now "east-west". (The directions are based on a traditional network diagram in which "north" is the physical user desktop and "south" is centralized storage.)
After deploying VDI, a user's applications won't be coming through your edge switches with the same access patterns as before. Instead, you are going to have a bunch of servers chatting merrily among themselves, creating new network bottlenecks that need to be modeled before you deploy.
Your storage choice will also have an impact on your bandwidth consumption. Centralized storage will require either a converged network adapter or a dedicated network to shuttle data around, and hypervisor vendors recommend using a dedicated NIC for VM migration and replication traffic on each compute node.
Dedicated NICs are also a real-world requirement for building server SANs and for host-based write caching; count up the ports you'll need per host and make sure you've got enough switches to handle it all.
Server resources: How much brawn is in that bare metal?
The purpose of VDI is to allow users to run applications from a remote location. Typically, the promise is any3: any application on any device at any time. Running these applications consumes resources. A dozen VDI instances all running Microsoft Word aren't going to consume a lot of CPU to do so, but running AutoCAD will.
The resources of the applications consumed by your users are not a simple linear additive. You cannot simply profile the resources consumed by their desktops, add them together, and call it a day. There is overhead involved in virtualization, and that overhead can vary dramatically from deployment to deployment, as some systems have devices for offloading workloads and some do not.
Graphics generation is an obvious concern here; the more graphics you have to generate, the more overhead there is. This overhead can get to the point that it seriously impinges upon your server resources, but CPU resources are by no means the only consideration.
Servers have limited system memory capacity and bandwidth. Even if you have enough RAM to handle all the VDI instances on a given server, you will not necessarily have enough RAM bandwidth to handle them all. RAM bandwidth is typically so high that most people never give it a second thought, but when you start running 100 desktops on a single server it can become a hidden bottleneck in a real hurry.
Hypervisors can overcommit with memory, allowing (say) a server with 16GB of physical RAM to juggle 32 VMs each allocated 1GB of RAM, on the basis that each virtual machine will probably only use a few hundred MBs of RAM. This can put a lot of pressure on system memory bandwidth.
In a similar manner, it would seem ludicrous to expect a single-workload system to max out its PCIe bus. With VDI, however, this is increasingly simple. 40GbE NICs, PCIe SSDs, GPUs for graphics offloading, and so forth can not only fill all the PCIe slots in the system, they can cause very real performance bottlenecks.
Storage: All those bytes have to go somewhere
Storage is all too often the least considered component of VDI. If you look at the "average" desktop, it doesn't do a lot during the day. Unless you are using your VDI instances to run a lot of rendering to the local virtual disk, the only real punishment you're going to see is during logon, logoff and updates. (Malware scanning used to be an issue but vShield takes care of that nicely.)
The problem with logon, logoff and update events is that they tend to occur all at the same time. Thus VDI storage has to be spectacularly overprovisioned (speed-wise) when compared to the daily grind. When it comes to figuring out how your gear will perform, even the best synthetic benchmark software [PDF] does not appropriately model storage demand. It is, from experience, the hardest element to pin down.
The "average" VDI user is expected to sustain about 10 IOPS throughout the day, but I have plenty of user groups that will gladly sit at 200 IOPS all day long, and some that will barely break an average of 4. More than any other area, storage is where "knowing your users and their workloads" matters.
Scenario 1: Designing for a knowledge worker
Consider for a moment the standard "knowledge worker": someone on the end of a string typing away into a word processor, answering instant messages and occasionally browsing the web. If you hand him or her a non-persistent desktop then update storms over the network aren't an issue; the only real storage traffic you're going to see is during logon and logoff.
Here, write caches – be they at the array or host-based – are your best friend. You can fill up the SSDs in the write caches during logon and logoff and they can sit there and drain to the storage for the rest of the day. This is the exact scenario in which they excel. Atlantis' ILIO, PernixData's FVP, hybrid storage arrays (Tintri, Tegile, Nimble) and server SANs (Maxta, Nutanix, VSAN, etc) could all be of use here.
This isn't to say that other storage technologies won't have an effect. Host-based read caches (VMware's vFlash, and Proximal Data's AutoCache) are an example of a technology that will be of some help handling the logon/logoff spikes, even if not quite so much as other write cache solutions. Still, read caches are comparatively cheap and can be slotted into existing infrastructure. As such they are also worth a look for this scenario.
If all your users log on at the exact same time then write caches will pay for themselves in short order; they will prevent your staff from sitting and waiting for their desktop to appear. If, however, your staff logon in a somewhat staggered fashion over the course of an hour then you can probably get away with a host-based read caching solution and save yourself a ton of money.
Deciding what to use becomes a "bang for your buck" calculation. This is where understanding your users and their use patterns will make a big difference. How much are you willing to spend, and what kind of logon/logoff performance do you really require?
Scenario 2: The media professional
A "media professional" is a completely different beast altogether. Here you have someone doing professional image editing, video editing, simulation, AutoCAD or other "heavy" workloads. GPU virtualisation is going to make the biggest difference to them; however, their storage profile is so different from the knowledge worker that different products will suit them better.
These folks spend a lot of time reading great big files and doing lots of work with them. They do a lot more reading than writing and when they do render out their files, they almost never write those files to storage managed by the hypervisor. Those files are likely going out to some battered CIFS/SMB server halfway across the network and upgrading the speed of that server is likely to make the biggest difference to the write portion of their workload.
This means that while popping GPUs into your VDI servers and beefing up the server where things get rendered will make the biggest difference for this group, host-based read caches should be a consideration for providing a decent user experience. Similarly, hybrid arrays and certain server SAN configurations should do the trick, though they may be quite a bit more expensive.
Another note for this workload category is that profiling network bandwidth – not just storage bandwidth – is important. Lots of sysadmins put a nice, fat 10GbE storage NIC into their VDI systems, but then leave the "user" network at 1GbE. If your users are writing their rendered files out to a CIFS/SMB server then there's a good chance that traffic will head out over your "user" network, and you need to make certain it is adequately provisioned.
Scenario 3: The perpetual-write workloads
Another group of users that I frequently deal with are "perpetual write" folks. In general, they don't do a lot of graphically intensive work, but they sit there doing lots of fiddly writes to their local file system all day long.
One example is developers, who are constantly compiling something or other, hammering a local database copy or so forth. Analysts are another group; they're usually turning some giant pile of numbers into another giant pile of numbers and when you let them run around in groups what you see is entire servers full of users emitting write IOPS all day long.
A host-based write cache might help with this, assuming that you sized the cache high enough to absorb an entire day's worth of work into the cache and assuming it had enough time to drain to centralised storage during off hours. This is also assuming there are off hours to drain the cache.
If there isn't much in the way of downtime then attempting to use a host-based write cache can get fantastically expensive very, very quickly. Yes, host-based write caches drain even while accepting new writes, but if your centralised storage is fast enough to absorb writes from hosts hammering it nonstop in real time, why are you using a write cache?
Similarly, server SANs and hybrids can have the same issues. If their design is ultimately a write cache on top of spinning rust, their usefulness is determined by whether or not they have time to drain writes to said rust. LSI's CacheCade would be an example.
Some hybrid arrays write everything to flash then "destage" cold data to the spinning rust. These would work just fine, so long as there is enough flash on the system. This is completely different from how host-based write caches work.
Host-based write caches must drain to central storage; that's part of the design. Both hybrids and server SANs have companies in play where data is never drained from the SSD to rust. This is known as "flash first". In "flash first" setups, all data is written to the SSD, so it behaves like an all-flash array as far as most workloads are concerned. Only "cold" data is ever destaged to rust; "hot" data never gets moved or copied to rust at all.
Host-based read caches are going to be (at best) a marginal help. By caching reads they will take some pressure off the centralised storage (leaving more IOPS for writes), however, if the cumulative workload is predominantly write-based, you aren't going to see much improvement.
If you only have a few of these write-intensive workloads, then any of the write caches discussed above will make the problem simply go away. If instead these write intensive workloads make up the majority of your deployment, and your deployment is running over a thousand instances, you should be seriously considering all-flash storage because you will likely need much more flash than can be housed on a local server.
Scenario 4: Mixed workload storage
In researching this article, I talked to several VDI experts and dozens of sysadmins who've been through the minefield with workloads far different from the ones I maintain. The number one piece of advice that these folks will give a newbie is "deploy VDI on its own infrastructure." Mixing and matching with other virtual workloads is frowned upon, and for good reason. Unless you know exactly what you're doing, mixing VDI and general virtualisation will get you into a heap of trouble.
We don't all have the luxury of following this advice. I've been doing VDI for about a decade now, and far too many deployments have been on mixed storage. Pilot projects are often run on existing infrastructure, and smaller shops are generally lucky to have centralised storage at all. In many cases, dedicated infrastructure for VDI just isn't going to happen.
Being able to support mixed workload storage is the Holy Grail: one storage technology for all scenarios. The problem is that while every storage vendor and their mum claims that the kit they're shifting is a "one size fits all" panacea for all ills, nothing out there actually is. If you want to alienate every storage vendor on the planet, this is the elephant in the room to discuss. (Hey there guys, how y'all doing?)
While all-flash arrays absolutely are the "one size fits all" from a workload perspective, for the overwhelming majority of companies out there, all flash is simply too expensive, especially when you would need to put all workloads on it.
Hybrids, server SANs and host-based write caching all battle it out on features. Replication, active-active clustering, deduplication, compression ... competition is so fierce that trying to pick the right one can be confusing. All these features rely on there either being enough downtime to do their various background processes or enough wiggle room in the IOPS load to meet demand while doing their storage voodoo in real time.
If these solutions become overwhelmed - or their flash fills up - active workloads have to start going to spinning rust. In this situation the entire virtualisation infrastructure will go from "awesome" to "unusably slow" in an instant. This is rare, but it does happen. I've seen the change in IOPS be so sudden and dramatic that over 40 per cent of VMs simply stopped responding and ultimately, crashed.
Bear in mind as well that mixing and matching different VDI workload classes can have a similar (though usually not as dramatic) "mixing" effect. I've had VDI experts tell me that in large deployments they create separate infrastructures for each class of workload just to avoid this.
In some cases it is merely separate cluster: GPU-accelerated workloads on systems with nVidia GRID cards, standard workloads on CPU-only servers. In other cases, they've mixed storage as well; high-demand clusters got caching software and SSDs installed, low-demand clusters did not.
Practical considerations for mixed workloads
If you must work with mixed workloads, management software can make all the difference. One of the reasons I'm such a fan of Tintri is that its management software is aware of the above issue and keeps an eye on how much performance you have remaining. It will alert you if you start getting close to the red line so that you can do something about it before everything goes pear-shaped. Several other vendors have similar systems.
I've spent the past two years swimming in storage, and the biggest bang for the buck I've found for mixed workloads environments is pairing a host-based read cache (AutoCache) with a primitive hybrid central storage (CacheCade). Price and simplicity are what ultimately mattered. My customers don't have money to burn and they don't have the knowledge required to fiddle with a bunch of nerd knobs "optimising" their storage every time they make a change.
It's the simplicity that sells it; if the central storage turns to glue, the VMs can still read the vast majority of what they need to read without having to ask the centralised storage for that information. Most VMs won't even notice that central storage has temporarily slowed to a crawl.
My anecdotal example is a data center in which the backup software would detect if it hadn't been run in the past X hours and trigger if this was so. A power grid failure had the data center down for two days. When everything came back online the staff immediately logged into their VDI instances and started doing a lot of write-intensive analytics work.
The central storage eventually collapsed under the combination of that write strain, sysadmins taking the opportunity to patch several servers, database integrity checking and the backups for all VMs triggering at the same time. Two months later it happened again, this time with host-based read caching installed and the network not only didn't collapse, it was usable throughout the recovery process.
The takeaway here is that if you plan to run your VDI mixed in with other workloads, model everything and monitor your storage usage in an automated fashion. Server workloads do all sorts of things that demand huge amounts of IOPS for prolonged periods of time. Unchecked storage demand conflicts can seriously degrade the VDI experience for your users.
The Golden Image
The key to a good VDI experience lies in the care and feeding of your golden image (aka parent or master) VMs. Most people just throw a Windows VM together with minimal customisation, toss it into a pool and hope for the best. This is the exact wrong way to go about handling your VDI deployment.
The key to a good golden image lies in understanding your use case and your potential user base. This will help you move towards deciding what needs to be in your initial golden image in the first place. It is also critical for deciding between persistent and non-persistent VMs.
VMs can be set up to be dedicated, where every user always logs into the same VM which they "own" (only possible with persistent VMs). VMs can also be floating, where VDI instances are logged into on a first-come, first-served basis (when using persistent VMs) or created and destroyed on an as-needed basis (when using non-persistent VMs).
Next you start applying best practises regarding security for the OS and the applications, doing your app installs and getting an initial feel for your environment. Use as few vCPUs as possible while still maintaining a good user experience. Use paravirtualised hardware drivers as much as possible. Remove unnecessary virtual components (such as parallel and serial ports) and make sure you install the hypervisor's integration tools. Make sure you install an anti-malware application that is supported by technologies like vShield so as to reduce I/O loading.
Consider where your users will have their personalisation stored (user virtualisation), as this will have a huge impact on how the above choices will impact the end user experience. Are you going to rely on Roaming Profiles, Folder Redirection, and Group Policy, or are you going to seek out third party tools? If Citrix is to your taste, they've been buying up companies for years now and have a user virtualization technology to meet any need. VMware Horizon View Persona Management is another option, and operates differently again from the above.
Among the third parties are companies like AppSense; their user virtualization is designed to ensure users always receive a consistent, personalized desktop, whether they're logging onto a physical Windows system or a VDI instance. Think Roaming Profiles on steroids.
This contrasts to the Unidesk approach. Unidesk is designed to allow you to provide everyone with persistent desktops without having to manage eleventy squillion individual VMs or deal with the storage footprint that would occupy. MokaFive has something similar and there are countless others offering many ways to make your user virtualisation dreams come true.
Once you've gotten something that you're mostly happy with – and you've documented exactly how you installed and configured everything – delete it and rebuild it from the documentation. It should work exactly like the one you just deleted. If not, your documentation is flawed and you need to rinse and repeat until you know everything there is to know about the configuration of that VM.
Once you've made your choices and have a well-documented golden image, it's time to start cloning. Start from one golden image and modify as many as you need to suit your different user groups.
Synthetic benchmark interpretation
As you can see from the above, a synthetic benchmark to model and test your design is only going to get you so far. The best of the best – LoginVSI – has decent approximations for most workload types, but even it is not a panacea for your VDI planning ills. You need to understand what your users will do. You need to tweak the benchmark workloads to appropriately model them.
Above all, you need to pay very close attention to which elements of your benchmarks are giving you what scores. What is the bit you need to change in order to achieve the best experience for your workload? Is it a new GPU, or do you need to tweak your storage? Are you running out your CPU, or have you managed to hit the wall on your memory bandwidth? If you are planning a mixed workload environment, you may need to run more than one tool simultaneously.
VDI isn't something you can reduce to a simple "cost per instance" equation. Even small VDI deployments will inevitably have more than one class of VM, each with their own cost. Additionally, the smaller the deployment the more that floor costs matter; you need to buy a minimum of equipment and licences to even play the game, making the $/VM of 1 VDI instance quite high, but $/VM of 1000 much lower.
Workload modelling can factor in to how much are you prepared to spend per host, and that jiggles the numbers on floor costs as well as $/VM. Some host-based caching solutions will ask $10,000 or more per host for the software and the SSD combined. Can you afford that? Does it provide enough of a boost to justify the cost? Would spending $1500 per host make more sense, even if the acceleration were slightly lower?
VDI is the ultimate in workload consolidation, and it requires the most intensive modeling and simulation efforts of any systems design excepting High Performance Computing (HPC). Just like the people building HPC supers, if you are going to do VDI you need to know how each element of a system interacts.
Good luck. ®