Data Centre

Cloud

OVH goes TITSUP again while trying to fix its last TITSUP

Attempt to harden network failed, badly, so the call's gone out to Cisco for help

By Simon Sharwood

26 SHARE

European web hosting outfit OVH has reported its second major outage and Total Inability To Support Usual Performance* in a month and admitted the new outage was caused by its attempts to fix the cause of the last one.

OVH's attributed its November outages to power problems and cable cuts.

But this incident notice filed by CEO and founder Octave Klaba on Wednesday 6 December stated “the problem was related to a software bug on the equipment we use which caused the deletion of the configuration.”

The notice continued: “Since then we have updated the equipment on everything our network. Also to prevent this type of bug from never [sic] again causes a worry [sic] about our DCs, we have decided to divide equipment clusters into 3 on the RBX website. So, if we ever have again this bug, the configuration would only impact 30% traffic.”

What did OVH learn from 24-hour outage? Water and servers do not mix

READ MORE

The company planned to change to that new regime late on 6 December, 2017, European time. But the changeover to the new systems has failed and caused connectivity problems and outages in Europe and beyond.

“During the preparation of the maintenance that was to start at 23:00, the configuration disappeared again at 8:20 pm and all the links were down again!!!!!” Klaba's notice said.

Those are Klaba's exclamation marks, by the way. It's understandable he used so many because the next sentence is: “The database has been deleted while we are using the latest software version. So there is another bug!”

Next step? “We look with Cisco to understand why all the links are not UP while the configuration was delivery to RBX.”

The outage has made for an ugly Status page at OVH, as depicted below from the time of writing.

OVH's Status Page: red lights everywhere. Click here to embiggen

If there's a small piece of upside in this incident, it's that it struck late in the European evening and continued into the small hours, times when traffic is low and some customers may not notice massive impact on their operations.

But there will also be plenty who were impacted, and irritated, and wondering why they give their business to a company that has also experienced flood damage and can't configure routers well enough to avoid this sort of thing.

OVH has promised to send The Register a statement about the incident. ®

* Total Inability To Support Usual Performance = TITSUP

Sign up to our NewsletterGet IT in your inbox daily

26 Comments

More from The Register

Facebook's new data centre cooling system takes the heat like Zuck in front of Congress

We're still the good guys! Our AC is super eco-friendly!

The last phablet? 6.4in Samsung Galaxy Note 9 leaves you $1k lighter, needs 'water cooling'

Pics Even worse: contains Bixby – you've been warned

DeepMind AI bots tell Google to literally chill out: Software takes control of server cooling

Oh good, we're out of the board games phase, then

TSB takes on 250 complaint-wranglers to absorb £200m outage fallout

130,000 complaints and counting from bank's users? So sorry to hear that

Telstra's mobile networks go TOESUP* in national outage

Updated That's 'Total Outage Ends Support for Usual Performance', natch

What did OVH learn from 24-hour outage? Water and servers do not mix

Coolant leak crashed VNX array at web host's Paris data centre

VMware, AWS preview database-on-vSphere

VMworld US Database ops need less 'muck' says AWS boss Andy Jassy

HubSpot outage KOs Red Hat Ansible site and other hapless marketers

Right in the middle of Inbound 2018 conference, no less

NAB mainframe turns its TOESUP* after power outage, offline 7 hours

Compensation offer after Total Outage Ends Support for Usual Performance

It's official – satellite spots water ice at the Moon's chilly poles

Maybe future astronauts can finally enjoy a nice cuppa