Original URL: https://www.theregister.com/2010/06/04/power_down/

Idle gear: It's too darn hot

Power management - use it or lose it

By Trevor Pott and Iain Thomson

Posted in Systems, 4th June 2010 10:11 GMT

Sysadmin Blog IT projects can arise from the most interesting of circumstances. One project begets another project and another and down the rabbit hole you go. The subject of this article is power management; a project brought to the forefront by of all things our upcoming replacement of desktops with low-power Wyse thin clients. If that seems bizarre to you, allow me the chance to explain.

The project at hand is the replacement of virtually all desktops in service with Wyse thin clients, which will force the last of our users onto VDI, and for all intents and purposes prevent them from running anything locally on the system that sits on their desk. From a desktop power management standpoint this is grand, because these little Wyse boxes absolutely sip power. No moving parts, and the LCD screens they drive put off more heat (and pull more wattage) than they do.

How then does this lead to a power management article? Well, all those cycles that used to be consumed on the desktop went somewhere: right into my datacenter (DC). From a performance standpoint, I’m ready for this.

I’ve been running VDI for a while, and a little bit of extra loading is really only running the servers closer to maximum utilization, rather than running us out of capacity. The issue at hand is one of heat. Running the servers closer to the redline has increased the thermals in the datacenter just enough to cause various things that go 'ping’ to email me several times a day complaining that they are a little too warm.

I’ve gotten away with it for the past six months largely because the Canadian prairies tend towards long, cold winters, and our outside air system kept the DC as cold as I could possibly want it. Well, I’ve had the (fingers crossed) last snowfall of spring here, and very soon it’s going to be 35 Celsius outside. At those temperatures, the pair of A/C units in place simply aren’t going to cut it.

I am additionally paranoid about one of them failing, and the other not even beginning to be able to cope. A third A/C unit is on its way, but it could be a couple of months before it’s fully installed and ready to rock.

Even better, our few non thin client desktops are all together in one area; an area that has seen quite a lot of equipment rearrangement over the past several months, and which will itself be facing thermal issues this summer. Looking at the weather forecast I have a week to figure out everything there is to figure out about power management, after which the organic fertilizer will encounter the rotary air recirculation mechanism. Not being the sort to ever do research twice if I can avoid it, the power management tricks I learn to keep my servers powered down can be applied to the remaining full-fat desktops in place, and even their thin client brethren.

The project at hand is essentially to create a system by which I can perform lights out management (LOM) on every system on my network, while powering anything and everything down when it is not needed. Having already virtualized everything that can be virtualized, the low hanging fruit that is left is all the gear I leave on hot standby.

These systems really should be on cold standby, but I leave them up because I periodically need to patch them, change their configs, etc. To run these in cold standby I need to be able to bring them out of a sleep state, and preferably to be able to access their console remotely in case there is some sort of error on boot. In addition there is spare networking gear that is similarly just sitting around waiting for the primary to fail.

There are also some systems that do their job, and then don’t need to be heard from again for a while. They could be powering down during this “dead time,” thus not pumping however many hundreds of watts of heat into the air that they consume just sitting there, and allowing the A/C units to get the air mass just that much colder.

When I think about it, we have some big multifunction printers scattered about the office that pull quite a bit of juice. Some of the newer ones can be programmed to go to sleep and wake up on print or at a specific time, but the MFCs and the smaller printers just don’t have that feature. This is where I start getting into the world of wake-on-LAN. It’s amazing what WOL can do.

As the name suggests, WOL lets you wake up from a sleep state a properly configured network attached device by sending it a specially crafted “magic packet.” WOL has been superseded recently by Desktop and mobile Architecture for System Hardware (DASH), but devices incorporating this standard are still pretty rare.

A lot of devices can be controlled via SNMP as well. I have some reasonably rudimentary uninterruptible power supplies (UPSes), but they do come with network management cards and I can do really neat things to them through either their built-in web server or via SNMP. Although I currently don’t have any, there are also power distribution units (PDUs) which are in essence glorified power bars that also offer network management capabilities.

I also have a growing fleet of Intel vPro desktops that make up a little over 50 per cent of the non thin client PCs I have to manage. vPro is a complete LOM system for an individual PC, so this is a huge bonus for this project. Similarly my newer generation of servers could be upgraded to use Asus’ ASWM 2.0 LOM card, though in truth none of the servers in place are actually equipped with them.

It is also worth noting that I have also based my entire fleet of file servers off of Intel vPro motherboards. Though technically not considered a server platform, I was so impressed by the Q35 chipset that I decided to base our file servers off of it.

Field systems have to meet some pretty extreme I/O requirements here; it is not uncommon to push them past their thermal limits and have a southbridge go up in a puff of smoke. Sadly, as a general rule, Intel chipsets have been the first to fold under extreme load. Not so the vPro chipsets; the Q35 and Q57 chipsets have impressed me deeply. It was because the Q35 survived our I/O testing, in combination with the LOM capabilities inherent in the board that I felt it would be a decent basis for my file server fleet. (With the addition of a real RAID card, natch.)

You won’t see me evangelize many products, but if you are thinking about putting an Intel processor in your box make absolutely sure it has vPro. Not only because it offers neat LOM tools, but in my experience vPro chipsets can simply take more punishment than the more standard Intel fare. (Why that may be I have absolutely no idea, but prototype testing bears it out with consistency.)

So while I have some desktops and some servers will full blown LOM setups, I can’t count on all my systems to be so equipped. It is fortunate then that I have invested in a small IPKVM. This, combined with WOL, can provide me full LOM for the servers in my DC. Desktops without vPro however, are going to have to muddle through without full LOM abilities.

Time to gather banners. If I put all the tools I have at my disposal together, I should be able to do some neat stuff that can help me conserve power, reduce heat generation, and still remotely manage my systems regardless of the power state they are in.

PCs and servers will pretty universally respond to WOL. Their operating systems can shut them down when idle, and with the right management software, I can wake them up on a predetermined schedule, or just whenever I need to poke at them remotely. Between vPro, my IPKVM, and possibly some as-yet-unpurchased server LOM cards I should be able to get complete remote console access to these systems as well. I can’t cover off all systems in service this way, but I’ll take what I can get.

I have printers and UPSes that will respond to SNMP, and I will have to pick up some PDUs that can do the same. With SNMP controlled PDUs, I can set up smaller devices, monitors or anything I can find that is pulling down a fair amount of juice to turn themselves off after hours. Perhaps more importantly I can rig up critical items, such as switches or my IPKVM such that if they do something strange and need a reboot, I can cycle the power on them remotely.

What I lack is the key bit that ties this all together; management software. There are plenty of alternatives from plenty of vendors, and in my next article I will take the time to dive in and tear a few of them up. While there are of course a plethora of commercial offerings to help you with your enterprise desktop and device management, the Open Source community’s offerings are a little bit more impenetrable. ®