OVH data centres go TITSUP: Power supply blunders blamed
Thanks, initial testing seems unduly problematic
Updated Power outages have brought some OVH data centres to their knees, and unspecified issues have broken optical cable routing in Europe.
OVH boasts it is used by "155 out of the 1000 largest European companies," and "20 out of the 500 largest international companies."
According to the outage monitoring service downdetector.com, over 100 watchers have reported issues with OVH hosting and email since around 7:37 UTC today.
CEO Octave Klaba tweeted that "two separated 20kV lines are down," and said the team was trying to restart generators for its central Europe SBG1 and SG4 data centres.
SBG: ERDF is trying to find out the default. 2 separated 20kV lines are down. We are trying to restart 2 generators A+B for SBG1/SG4. 2 others generators A+B work in SBG2. 1 routing room is in SBG1, the second in SBG2. Both are down. #Murphy— Octave Klaba (@olesovhcom) November 9, 2017
Also, optical links to point-of-presences (POPs) are down.
RBX: all optical links 100G from RBX to TH2, GSW, LDN, BRU, FRA, AMS are down.— Octave Klaba (@olesovhcom) November 9, 2017
ovh.co.uk and ovh.com are also not loading for us here in London. However, it sounds as though OVH has been making a tiny bit of progress...
SBG: 1 gen restarted.— Octave Klaba (@olesovhcom) November 9, 2017
We called OVH's UK customer support number, and heard the message "service is closed" and a long "ewwwwww...." It's maybe unrelated.
Spencer Pryor, CEO of Your Radio in Glasgow, Scotland, told The Register that its websites and 1,000 audio streams running through OVH are down. He said between about 10 and 20 per cent of its audience was online. "It's just aggravating this morning," he said.
Luke Ellis, of Livebuzz, told The Register he saw this outage "negatively", adding: "How can a tier-four data centre experience total loss of power to its routing equipment for over three hours?! Both redundant lines and both backup generators."
OVH had a pretty nasty water cooling leak in July. We contacted a spokesperson for comment on this latest cockup. ®
Updated at 14:04 GMT November 9 to add
The biz has said that most customers should be up and running soon, and that systems were returning to normal. "In the coming days, impacted customers will receive an email to trigger SLA commitments," the CEO wrote in the most recent update on its website, here, in French.
Updated at 09:19 GMT November 10 to add
An OVH spokesperson told us: "Here is the latest update of the situation we can share so far." They said an official statement would be released shortly.
Updated at 20:15 GMT November 10 to add
Full details of the power failures are documented here (translated here). "A power outage that left three data centers without power for 3.5 hours," the note reads. "SBG1, SBG2 and SBG4 were impacted. This is probably the worst-case scenario that could have happened to us."
Essentially, there wasn't sufficient redundancy in its power supply lines, so when one cable failed, it all went down, down, down.
PS: TITSUP stands for Total Inability to Support Usual Phhhhuuuu... how many hours?!