OVH goes TITSUP again while trying to fix its last TITSUP

Attempt to harden network failed, badly, so the call's gone out to Cisco for help

By Simon Sharwood, APAC Editor

Posted in Cloud, 7th December 2017 00:24 GMT

European web hosting outfit OVH has reported its second major outage and Total Inability To Support Usual Performance* in a month and admitted the new outage was caused by its attempts to fix the cause of the last one.

OVH's attributed its November outages to power problems and cable cuts.

But this incident notice filed by CEO and founder Octave Klaba on Wednesday 6 December stated “the problem was related to a software bug on the equipment we use which caused the deletion of the configuration.”

The notice continued: “Since then we have updated the equipment on everything our network. Also to prevent this type of bug from never [sic] again causes a worry [sic] about our DCs, we have decided to divide equipment clusters into 3 on the RBX website. So, if we ever have again this bug, the configuration would only impact 30% traffic.”

What did OVH learn from 24-hour outage? Water and servers do not mix

READ MORE

The company planned to change to that new regime late on 6 December, 2017, European time. But the changeover to the new systems has failed and caused connectivity problems and outages in Europe and beyond.

“During the preparation of the maintenance that was to start at 23:00, the configuration disappeared again at 8:20 pm and all the links were down again!!!!!” Klaba's notice said.

Those are Klaba's exclamation marks, by the way. It's understandable he used so many because the next sentence is: “The database has been deleted while we are using the latest software version. So there is another bug!”

Next step? “We look with Cisco to understand why all the links are not UP while the configuration was delivery to RBX.”

The outage has made for an ugly Status page at OVH, as depicted below from the time of writing.

OVH's Status Page: red lights everywhere. Click here to embiggen

If there's a small piece of upside in this incident, it's that it struck late in the European evening and continued into the small hours, times when traffic is low and some customers may not notice massive impact on their operations.

But there will also be plenty who were impacted, and irritated, and wondering why they give their business to a company that has also experienced flood damage and can't configure routers well enough to avoid this sort of thing.

OVH has promised to send The Register a statement about the incident. ®

* Total Inability To Support Usual Performance = TITSUP

Sign up to our NewsletterGet IT in your inbox daily

26 Comments

More from The Register

Facebook's new data centre cooling system takes the heat like Zuck in front of Congress

We're still the good guys! Our AC is super eco-friendly!

Telstra's mobile networks go TOESUP* in national outage

Updated That's 'Total Outage Ends Support for Usual Performance', natch

What did OVH learn from 24-hour outage? Water and servers do not mix

Coolant leak crashed VNX array at web host's Paris data centre

NAB mainframe turns its TOESUP* after power outage, offline 7 hours

Compensation offer after Total Outage Ends Support for Usual Performance

Mailshot meltdown as Wessex Water gets sweary about a poor chap called Tom

Water horrible thing to say

Fujitsu's Australian cloud suffers storage crash, outage

User tells of significant data loss

Comcast's mega-outage 'solution'... Have you tried turning your router off and on again?

Updated US ISP giant claims service has been restored for most users

EUROCONTROL outage causes flight delays across Europe

5 hours downtime in 17 years is pretty good – but moaning about late planes trumps all

Boffins find sign of water existing deep into Earth's mantle by looking at diamonds

How far down does water drip?

VMware's GM for networking and security jumps to Google

Veteran Jeff Jennings to get the band back together with VMware founder Diane Greene