UPS death in Pulsant data centre knocks out websites
Four-minute outage sparks domino effect
IT infrastructure company Pulsant suffered a power outage at its Maidenhead data centre last night that cut websites off from the internet.
Although electricity was restored soon after by routing around a dead uninterruptible power supply, the blip knocked out web hosting firms' servers and firewalls, and issues were still ongoing hours later.
Bringing up systems and recovering data after a forced hard shutdown takes time so some businesses were still sorting themselves out today.
Pulsant told The Register that the problem was "a partial power disruption during scheduled maintenance to part of our Maidenhead data centre facility overnight [which] did not affect the whole site or facility".
The company, which used to be known as Lumison and also now owns Blue Square, initially reported on its status page that it was a network problem, which infuriated customers on Twitter.
"I'd like to know why Pulsant is reporting this as a network error… that brings down power?" web hosting firm Netmotivated tweeted.
An hour after blaming the network, Pulsant said that it was having "power issues" at its Maidenhead facility.
Some customers claimed that there there isn't enough staff the data centre at night and the problems weren't taken care of quickly as a result.
"This isn't good enough for our clients, so we will be taking it up with their management tomorrow," hosting provider Xilo Comms tweeted last night.
However, in a lengthier update at 11.20AM BST, Pulsant insisted that there were six engineers onsite when the power went out.
The company explained that during planned maintenance, a uninterruptible power supply (UPS) failed after a device swap.
"This was due to a component failure which had been changed during the routine maintenance with a certified new manufacturer component. As the UPS device was placed back into service, the electrical load was taken over and subsequently dropped as the component failed," Pulsant stated.
"At this time, customer racks connected to the failed UPS device in the Maidenhead 2 & 3 facilities will have experienced power loss."
The UPS was bypassed to get the power back on just a few minutes later. The device awaiting a full test by engineers before it can be brought back online tonight, Pulsant said. ®
Best one I ever had was a server dropping offline with no warning, machines above and below it in the rack were still operating perfectly so I knew the problem had to be local to that one machine.
Phoned up to ask if someone could have a quick nose at the machine and see if they could see why it wasn't responding to anything only to have the response of "That would be the one that was belching out smoke"
turned out that the filtering cap across the mains inlet on one of the psu's had caught fire, but because the psu was still operational (albeit with flames) it hadn't switched to the backup
3 little words
"during planned maintenance"
Well, we all have a few bar-room stories there...
Proviso: I'm an old mainframe systems programmer, so naturally biased .-)
Re: 3 little words
Yup, always amusing that one.
My favourite was an IBM System/38 that was having a microcode fix installed and was down to engineering mode for the short, planned, maintenance window. The '38 has four cooling fans, one on the CPU / cards (critical) and three on the PSU (any two will do). The engineer noticed that one of the fans on the PSU was dead and offered to swap it, as he had one in the car and it would only take a few minutes to drop from engineering mode to full power off.
15 minutes later, the thing's off and a new fan is installed in place of the dead one. Switch on and one fan, the new one, starts up. We all stand there, hoping it'll make it through the bootstrap to the boot logon so it can be shut back down, before the inevitable thermal check. It didn't.
Needless to say, he didn't have three more spares on him and went from hero to zero......