The Register® — Biting the hand that feeds IT

Feeds

One titsup server kills Brit-hosted Donhost websites for THREE DAYS

Where was the back up?

Cloud storage: Lower cost and increase uptime

A fault in just one server at Brit web hosting biz Donhost took out thousands of websites and emails for more than three days.

The service slowly found its feet again on Wednesday afternoon after the company officially confirmed it fell over on Monday at 7am. Affected customers posting in a help forum put the start of the downtime even earlier - saying they first noticed problems on Sunday night.

Email processing was knackered, too. Donhost assured folks that no messages were lost, though users reported that emails to the affected domains were bouncing.

Donhost, a trading name for British company Webfusion Ltd, fessed up to the cock-up on its service status page:

Server 50 connectivity issues: Due to a full system failure our administrators have been unable to recover the server to deliver websites and associated services.

We have created a new server to host the sites and services on and our system administrators are currently restoring all sites from our fail safe backups.

Donhost pinpointed the fault in just one server - doomed server 50 - leaving many users perplexed by how one machine could cause so much trouble and why wasn't there sufficient backup kit in place.

According to posters in the forums, Donhost recommended new resellers move to Heart Hosting - another Webfusion-owned server business. Donhost primarily offers premium business hosting, dedicated servers, and sells its services to clients that resell it to punters.

Donhost say that no material has been lost, the flaw has been fixed, and email handling will return to normal.

We asked parent company Webfusion for a comment on why the problem took so long to fix and what assurances it can offer customers. We will update this piece if they get back to us. ®

Updated to add

A Webfusion spokesman has been in touch to say: "It was a technical failure that resulted in a complete system overhaul, which regrettably took longer than we had anticipated. All customers are now fully operational and our teams are helping customers on a case by case basis."

Steps to Take Before Choosing a Business Continuity Partner

Maximum incompetence

Not much else to say, really.

Maybe once they're laid off, they can find a job at Microsoft, managing Azure, they'd fit right in.

5
0

Clarificaton from Webfusion ..

How long does it take to swap out the hardware and restore from last nights backups. I would suspect that most of the delay was in finding someone technical enough to do the job, as they fired most/all of the technical staff - to save money.

"It was a technical failure that resulted in a complete system overhaul, which regrettably took longer than we had anticipated"

That's totally cleared up the issue for me.

1
0

Re: Was it a windows server?

I have to agree, this is a damagement issue.

1
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
You don't need phone lines or cable for ANYTHING, says Dish
The satellite-dish man can sort you out with phone and broadband over the air too
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches