Power and cooling in the data centre
Hot and hotter
Power and cooling are the critical services demanded of a data centre facility, yet diagnosing problems and fixing them while keeping the infrastructure running can be surprisingly difficult. Yet they've never been more important as virtualisation increases the concentration of critical resources in the data centre.
For example, while the numbers might suggest that the right volume of cooling air is being pumped into the box, the fact remains that some servers are overheating and are threatening to break the service level agreement.(SLA). What do you do?
Here we'll look at some common problems and fixes.
Much attention has focused recently on data centres cooled by fresh air but the traditional closed box with chiller design using an open, hot aisle/cold aisle design remains dominant. In such circumstances, first check that the numbers are correct and that one watt of cooling is being supplied for each watt of computing power.
Hot air needs to be pumped out, so the capacity of the equipment removing it needs to match that of the chiller pumps. If they were acquired from separate manufacturers, an external contractor may be able to verify that the two systems are in fact in lockstep.
Here for the CRAC
CRACs (Computer Room Air Conditioners) offer cooling, heating, humidification and dehumidification, so all the CRACs working in the same area need to be operating in the same mode. If they are unco-ordinated, they waste energy fighting each other as one attempts to humidify while another dehumidifies.
If the the CRACs are not delivering cooling throughout despite being adequately specified and configured, check the temperature of the cooling fluid - whether water or glycol --directly from the surface of the pipe.
If cooling is patchy, with the equipment in some racks reporting problems, then they may be suffering from poor air distribution, which can cause hot air from the back of the rack to recirculate round to the front. This can especially be a problem in racks that are not fully populated, so that air can easily find its way to the front.
To check, record the temperatures in each aisle so you can compare and find the hotspots, taking a measurement at 1.5 metres above the floor. You should expect these temperatures to be between 20-25 degrees C. Blanking plates may be the simple answer.
Bear in mind too that it's possible for customers in a co-location facility to install equipment and fail to log it correctly, resulting in unexpected hotspots.
Having found the hotspots, check the air velocity from the floor grilles. It may be that the underfloor air velocity is too high which can be counter-productive, as it leads to air being sucked out rather than being pumped in. The underfloor air path can easily become obstructed, while voids and missing floor tiles can have an adverse effect on air pressures.
Finally, it sounds daft but it's worth checking that all the rack-mounted gear is installed the right way round so that the vents are exhausting into the hot aisle.
All the above is about responding to problems but it makes just as much sense to be proactive and check on a regular basis that the system is delivering as it should. In this way you can pick up a failing CRAC unit that's running below capacity before it becomes an emergency, prompted perhaps by the installation of a new, heat and power-intensive rack full of blade servers. ®