Deleted cloud in second fall from sky
More off-demand computing from FlexiScale
SaaS data loss: The problem you didn’t know you had
XCalibre's FlexiScale cloud has disappeared from the heavens. Again.
In late August, an engineer with the UK-based hosting outfit accidentally deleted the company's high-profile compute cloud - which offers on-demand storage, processing, and network bandwidth a la Amazon Web Services - and now XCalibre is working to resolve a "core network failure" that has kept some customers off-line for as much as twenty-four hours.
According XCalibre CEO Tony Lucas, the outage hit at about 5pm UK time on Wednesday, when the cloud experienced "a near simultaneous switch failure" in the switches that connect the storage to the processing nodes. "That is relatively easy to fix, though you do have to take everything down and restart it again," Lucas tells The Reg. "But because of a software limitation in a particular piece of software we use...which only allows you to do one job at a time, so when we have to restart hundreds and hundreds of servers, it takes sometime."
It's no secret that FlexisScale relies on Virtual Iron, the virtualization manager based on the open-source Xen hypervisor.
Lucas says that some customers were back up and running by 9pm UK yesterday. But others are still waiting for their bit of cloud to reappear. "Every single server was restarted by 7:45 this morning [UK time], but there is a network bug that a number of them are still having issues with. We're going through them one-by-one and we're down to a handful - somewhere in the teens."
Lucas is intent on beefing up his architecture so this sort of thing doesn't happen in the future. But that's twice in two months. At the end of August, that engineer accidentally deleted the cloud's main storage volume, and XCalibre needed several days to rebuild it.
And in the midst of the latest outage, some customers are peeved. "I am angry, very angry, so yes there's some vitriol in here, I was hoping that sleeping on it would dull that, but being that all my servers are still down it hasn't," says someone who calls himself Flish.
"I didn't have to wait very long for the next outage," says RichText. "Fortunately, I'm only testing things out at the moment. Does anyone actually use Flexiscale for anything mission-critical?"
A good question. There are drawbacks to putting your apps in the sky. In recent months, we've also seen plenty of downtime from Amazon Web Services - and Google Apps too. ®
COMMENTS
What are clouds made of?
Where I live, clouds come and go all the time. I prefer it when there are no clouds, its nicer.
As for clouds with computers, we all know mixing water and the electricity used by computers will eventually lead to disasters. Anyone using a computer in a cloud should expect the odd spark or two.
Honest Answer
Well, to give some balance, have had Tony from Flexiscale on the phone giving me an open and honest explanation of what happened and why. Doesn't change the end result, and he doesn't deny that it wasn't wrong, they got burnt again, before they've had a the opportunity to apply the fixes they need. But applaud Tony for his honesty after the fact
judging by the comments
I'm not the only one that thinks farming out your critical systems to a 3rd party company is a Bad Idea. OK if you are running a smallish website, but you don't know what the failover is (if there is any, as cloud is supposed to be failsafe yes?), you don't know their backup system, and you don't know actually how skilled the guys are the other end are.

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring