Are you being robbed of sleep by badly designed servers?
Mornings, nights, they all blur into one for our man Trevor
Sysadmin Blog How should we design the servers and end-user computers of the future?
The construction of my testlab has given me the opportunity to play with technologies I normally wouldn't be able to get my hands on. The "advanced" features in them – standard fare by now for large enterprises – have caused a measure of introspection regarding the importance of lights out management.
I disagree that the day starts at some pre-determined hour simply because it coincides with the rotation of our local patch of mud to face some fusing ball of hydrogen plasma about eight light-minutes out. When your day job is local but you write for (and have clients) all over the globe the entire notion of day/night cycles takes on a somewhat arbitrary aspect.
Despite my disdain for traditional timekeeping, I get somewhat ornery when one of my all too brief periods of unconsciousness is interrupted to drive halfway across the city to reboot one physical server. I'm even less impressed if the resolution to the issue requires carefully walking a highly non-technical individual through identifying the right system and performing maintenance via a bored translator.
A lot of the networks I maintain have older computers. I get it, really I do. I still have systems in service dating back to the late 1990s; entire server rooms full of units from 2009 and desktops that are rebuilt servers from 2004. When this stuff was purchased even primitive lights out management was a $350 add-on to your whitebox server.
vPro – the only real hope of any sort of remote management on end-user computers if the OS was dead – had more than a few problems. It was incredibly primitive (no remote screen abilities,) the systems had pretty limited selection and it imposed a price premium that started to hurt at volume. What hurts is that we are still buying systems without these management technologies today.
The most common technology for solving this problem is Intelligent Platform Management Interface (IPMI). Depending on the exact details of the implementation, IMPI can allow administrators to work with servers "below" the BIOS level. That is to say that you can log into the system's BIOS and make changes, or even flash the BIOS remotely.
In nearly every modern IPMI system, this includes an IPKVM; that is, the ability to remotely connect to a server's video output and use your local keyboard and mouse to manipulate it. Modern IPKVM systems also include the ability to mount up floppy and optical media images; this means the ability to load operating systems or use diagnostic tools from the other side of the planet, should that strike your fancy.
The Supermicro Fat Twin on my server rack gives me the opportunity to see how the other half lives. The servers in that Fat Twin all have IPMI. Indeed, even the little Supermicro mini-server that just entered my lab for review has a full IPMI setup.
In the past week alone I've reloaded the operating system on two of those Fat Twin servers nearly two dozen times as part of my testing. The servers would periodically purple screen on me (when the beta software I'm working on exploded) and I would have to remotely reboot them. All of this I managed from half a city away.
IMPI implementations used to be finicky. Supermicro's own first attempts at IPKVM-integrated IPMI setups were not exactly what I'd call stable; other vendors were just as bad. This has changed.
Remote management tools are catching up. I can't crash today's Supermicro stuff. They even have a little java app that allows me to scan for all compatible servers, manage them without having to use the webapp built into each server and even sort them into groups for ease of management. It's far from perfect, but it is absolutely light years beyond communicating at one word a second through a translator half a world away.
The cost has plummeted as well. The days when the baseband management controllers that contained IPMI, the IPKVM and so forth needed to cost $350 are long gone. The Supermicro miniserver's motherboard can be had for under $250 online; that includes the CPU.
If you're the size of Facebook or Google you measure your computing in acres. At that scale you have an army of competent, trained rackmonkeys to swap out dead components (or entire servers) as they fail. The added cost of lights out management – even reduced as it is – will start to get burdensome at that scale.
For the rest of us I think it's time that we – as purchasers of technology – sent a collective message that the time has come for lights out management capabilities to be provided as a standard component and not an (expensive) optional extra. The sanity you save may be your own. ®