STOP! It's dangerous to upgrade to VMware 6.5 alone. Read this
Please don't be like the admin who failed to update PSC first
Posted in Virtualization, 25th January 2018 09:04 GMT
At a client site recently, we had to investigate why the upgrade from VMware vSphere 6.0 to 6.5 had gone wrong in that the normally rock-solid environment was a bit ill – to say the least.
On-site conversation ran something along the lines of:
“Webservices say: ‘No’.”
Cue PSC reboots, vCenter reboots and increasingly desperate measures. The service limped along like a one-legged frog with a hangover. It was one of those issues where nobody really understood why. It was a simple upgrade, right? VMware support were engaged. Even level-three support were a bit unsure and a lot of “WTF” was exhaled. Finally, the whole tale was unravelled after the admin in question was quizzed about how he had performed the upgrade.
He had upgraded the environment without upgrading the PSC first, essentially pulling the security foundation from underneath while it was running. It took VMware’s best several days to fix. During that time the weirdness varied. Some people could get work done, then not.
Software is becoming ever more complex in order to provide new experiences, doubly so for the cloud. Unfortunately, complexity also makes the upgrades a bit more difficult. Being a VMware admin and knowing VMware well, I thought it time to speak up and help those who are behind the curve to understand the process of upgrading to the all-singing, all-dancing, PSC-enabled world of 6.5 VMware. I would hope this would also serve as a bit of a template on upgrading VMware infrastructure.
With version 5.5 of vSphere came PSC or Platform Services Controller. PSC allows the linking of many vCenters without many of the limitations from earlier versions (i.e. linked mode).
One of the key jobs of the PSC is to act as a reverse proxy and single sign-on (SSO) infrastructure for vSphere authentication. Everything under one piece of glass. As a side note, all this refers to version 5.5 onwards. If for some crazy reason you are running older than 5.5, it means that:
- Little of this applies to you (a different PSC style was installed at that point, back in the day)
- You seriously need to upgrade. Spectre/Meltdown patches only go back as far as 5.5 for one
- Expect to pay a large chunk of cash for the upgrade licences
By default, vSphere 6.5 comes with its own local authentication system to manage the vSphere environment and is hosted on the PSC but can (and almost always does) get superseded by Active Directory integration that is turned on shortly after. The PSC can handle multiple authentication system and supports all the commonly used ones (AD, LDAP, local account). VMware didn’t put PSC in there just for laughs. It is a modern, expandable authentication system and should be treated as such. It provides a key part of their cloud-supporting infrastructure. Before the administrator attempts the upgrade, the PSC upgrade needs to be handled with care.
Admins who have only one site can stop reading now, as they have swerved the complexity by only having one PSC. This means the upgrade becomes a simple affair and the upgrade scenario is included in the out-the-box upgrade.
For the rest of us, here is what you need to do.
If you are not running 5.5, you need to do the 5.0 to 5.5 upgrade. At that point you can then perform this upgrade. The details on how to do this can be found here (PDF, page 44).
It is prudent to check beforehand that all your infrastructure has fully functioning and accurate DNS and time servers. NTP and DNS are key. Another important bit of housekeeping – sometimes admins get caught out when one or more of the virtual machines as a CD-ROM attached to them. Leaving a CD attached prevents live migration between cluster nodes and therefore the host evacuation never gets completed, worse DRS won't work with that VM. Nothing worse than a lopsided cluster! There are plenty of ways to get a list of attached CDs including PowerShell and RVTools (if you haven’t checked this out yet, you are missing an extremely useful tool). For the rest of us, ensure all the PSCs are running the same code revision.
This is easy to do. Just log into the PSC URL, it should show the version in question. Assuming they are all patched to the same release, the next step is the upgrade the PSC. Using the vCenter 6.5 release ISO that you want to go to, run the upgrade for the PSC. PSCs can be snapshotted. You never know when you will need to roll back. You’d be crazy not to. The PSC upgrades should all be done one after the next. The best advice is to do them all one after the other until they are all done.
Once the PSC is at the new patch level, it is now OK to upgrade the vCenter(s). Re-run the installer that was used for the PSC upgrade and use it to run the vCenter upgrade (both components are on the one disk).
It’s important to get the upgrades done as soon as possible because running different code versions is never a good idea. Once installed, check the estate over and make sure it is clean. Look for any nasty warnings or alarms.
Proceed with caution (RTFM)
The last task is to upgrade the hosts. This isn’t so critical and can be staged across a more relaxed timeline. If you haven’t already, learn how to use VUM to automate those host upgrades. It is what VUM was designed for.
One thing that sometimes catches out administrators is that sinking feeling when a legacy setup has a raw device attached. Plan around this and downtime for it!
Other gotchas include checking the VMware Hardware Compatibility List. If you have older hardware, be prepared to bin it! Running hardware that is too old means your virtualization experience is going to be sub-optimal anyhow.
That said, there is a danger that some applications (such as middleware by a certain American company whose first initial is I) that are licensed in terms of CPU capacity ratings rather than per socket. In one specific case of moving from older hardware, I saw the licence cost quadruple. It took years to sort that one out.
As always, it comes down to who is paying for the difference. Take it from someone who has been there – the administrator needs to get the users (or whomever pays for it) onboard and do it early.
Ensure that the VMware hardware versions and tools are upgraded. As always, tools versions first then a virtual hardware upgrade. Again, these don’t have to be done right away but upgrading to new hardware versions brings benefits and VMware tools should always be on the latest version anyhow. A tip... Don’t blindly apply the tools updates. I have seen instances where VMtools caused some odd behaviour (a DLLs was overwritten and screwed some functionality). Snapshot is your friend but remember you can’t use snapshots for hardware upgrades!
Lastly, don’t forget to upgrade those licences and apply them to the hosts and vCenter. Sure, you get a small grace period but things like this can be forgotten. Cue panicked upgrades in the portal and applying them.
The upgrade is very worthwhile but creating a plan and getting the owners/users onboard is the order of the day. The actual process isn’t that bad as long as you follow the documentation.
6.5 brings a lot of new stuff to the table, such as encrypted VMs, increased VM sizes and more intelligent DRS, and that’s just for starters. It’s in your interests to upgrade. ®