The Register® — Biting the hand that feeds IT

Comments on: Fire at The Planet takes down thousands of websites

Even the Status page seems to be down. 

Posted Sunday 1st June 2008 17:39 GMT

Stop

Even the plane't own site seems to be down now as I can't access the status page link from this article either.

the forums are up 

Posted Sunday 1st June 2008 17:51 GMT

Try getting to http://forums.theplanet.com/index.php?&showtopic=90185&st=20

It's slow, but I bet there are thousands waiting for news.

I have around 25 sites hosted in the Houston centre, I'm surprised the phone hasn't started ringing yet !

The latest update from the Planet is ... 

Posted Sunday 1st June 2008 17:56 GMT

To keep you up-to-date, here is the latest information about the outage in our H1 data center.

We expect to be able to provide initial power to parts of the H1 data center beginning at 5:00 p.m. CDT. At that time, we will begin testing and validating network and power systems, turning on air-conditioning systems and monitoring environmental conditions. We expect this testing to last approximately four hours.

Following this testing, we will begin to power-on customer servers in phases. These are approximate times, and as we know more, we will keep you apprised of the situation.

We will update you again around 2:30 p.m. this afternoon.

###################################

2:30pm CDT is about 8:30pm BST, so an hour and a half till the next update.

They also have said :

"We absolutely intend to live up to our SLA agreements, and we will proactively credit accounts once we understand full outage times. Right now, getting customers back online is the most critical."

Data centres 

Posted Sunday 1st June 2008 18:02 GMT

Alien

I once had a tour of a data centre in the UK and was shocked to find their fire extingusher system was "water sprinklers", not powder or gas. Apparently the insurance company requested that system. You don't see that system on the star ship Enterprise.

multiple sites 

Posted Sunday 1st June 2008 18:27 GMT

Heart

People should spread their servers around colocation sites - Use a provider in Edinburgh as well as London for example.

5 other datacentres? 

Posted Sunday 1st June 2008 18:42 GMT

And not one of them has a recent backup to load Houston's data so the public can be served?

lol oh well great network uptime 

Posted Sunday 1st June 2008 18:44 GMT

Gates Halo

So much for a superior network when you cannot keep the power on!

Need I say SOFTLAYER at all! The Planet is run by muppets anyway

Look on the bright side... 

Posted Sunday 1st June 2008 19:30 GMT

Thumb Up

Phorm's server(s) is(are) there, are they not? :-)

hmm 

Posted Sunday 1st June 2008 19:42 GMT

Pirate

Hmm , crap power supplier not adequately monitoring the grid delivery and sounds to me more likely the system mains power transformer has been running at 115% plus power overload margin for far too long !

Mind you ,I have seen and heard of at lest three go with a very spectacular bang in my area causing massive local power outages for hours on end !

*twiddles thumbs 

Posted Sunday 1st June 2008 20:08 GMT

We have a server there: http://www.cardesignnews.com

Funny we also haven't heard from any users yet... but maybe the nameservers being down will affect (outsourced) mail too? (Our servers are up and running in H2 but our nameservers are in H1).

oh dear, BT's webwise servers have gone down 

Posted Sunday 1st June 2008 20:36 GMT

oh dear, BT's phorm hosted / controlled server for the webwise system seems to have been a casualty as well, not a very resilient system BT

peter

I'm Lucky 

Posted Sunday 1st June 2008 20:51 GMT

My backup server in H1 is down. I noticed that it timed out last night when transferring from my primary servers. Damn lucky that all I use it for is backup and secondary DNS. I had a server down with 1&s****house1 last week I toyed with the idea of moving those server customers onto the backup server briefly before setting up a new server elsewhere. Glad I bit the bullet and set up a new server straight away instead. Even more lucky is the fact that my experience to date with The Planet has been so good that I nearly set up the new server with them.

I'm willing to bet that The Planet will get the entire Houston data center back on-line faster than 1&1 can get my server with them back on-line. But then thats not much of a bet the 1&1 server has been off-line for 14 days now! Muppets.

@Alan

It's the weekend. Come 10:32am Monday and if you're still off-line you'll know about it, that's how long it takes customers to realise that it's not their exchange server that's the problem.

DR/Redundancy? 

Posted Sunday 1st June 2008 20:57 GMT

Joke

I'm not a webhosting or datacentre guy, but I was under the impression that there would be procedures in place to guard against this sort of outage - offsite stuff, redundant sites, etc?

If that's the case, is redundancy what is going to happen to the DR guys? :-)

Alan - maybe no-one likes your websites and doesn't care? ;-)

Steven "Only joking Alan" Raith

this is a disgrace 

Posted Sunday 1st June 2008 21:41 GMT

Thumb Down

it's nearly monday morning and b3ta isn't working

people may be forced to do some work if this isn't fixed soon

Could have been worse... 

Posted Sunday 1st June 2008 22:12 GMT

Dead Vulture

http://news.bbc.co.uk/1/hi/world/americas/7424571.stm

It may be due to rccent downszing of power consumption 

Posted Sunday 1st June 2008 22:13 GMT

Flame

According to http://www.thehostingnews.com/news-dedicated-server-firm-the-planet-data-center-manager-garners-award-4306.html it claims "Mr. Lowenberg conducted a six-month trial to reduce power consumption and increase data center operating efficiency. Initial results demonstrate that while critical server loads increased by 5 percent, power used for cooling decreased by 31 percent. Overall, the company experienced power reductions of up to 13.5 percent through a broad range of improvements. The new green initiatives were conducted across its six world-class data centers." and also The Planet operates more than 150 30-ton computer room air conditioning (CRAC) units across its six data centers. In one data center alone, the company was able to turn off four of the units. The cooling requirement on two of the units was reduced to 50 percent of capacity, while another nine now operate at 25 percent of capacity. The company also extended the return air plenums on all of its down-flow CRAC units to optimize efficiency. "

@By Chronos 

Posted Sunday 1st June 2008 22:28 GMT

Paris Hilton

So if the Phorm/Webwise system was operational, does that mean BT Broadband system goes t*ts up?

Oh dear BT, this Phorm/Webwise system is going to do wonders for your customer satisfaction. ... NOT!

BT customers could be leaving in Droves!

(They possibly will anyhow once they get a handle on the invasive nature of Phorm/Webwise interception of all their HTTP traffic and find out the history of Phorm (121Media) and its nasty spyware products.

Paris, Because she loves a warm fire in her belly and she frequently goes t*ts up.

It's not Planet who should run backups of customers sites 

Posted Sunday 1st June 2008 22:31 GMT

We host a site of medium-high importance. It has been our plan that as soon as we can financially afford to set up a duplicate rack in a totally different location we will. Both sites will be load balanced and data replicated in real time. It adds at least 150% to the cost but if you need that level of resiliance you have to pay for it.

Data centres are better than hosting in your office but are vulnerable to outages as many people know well. Our last outage was because some idiot (a data centre engineer?) switched off power thinking it was only going to affect someone else's rack. They may have fire supressing gas and the best security system, but there is no technology to prevent the employment of idiots. And there will always be idiots.

what a lot of b***x 

Posted Sunday 1st June 2008 22:53 GMT

Thumb Down

While they save up on their electricity bills, small businesses are going to pay heavily!

all our 3 servers are down!

they will survive! but will we???

give them a break 

Posted Sunday 1st June 2008 22:53 GMT

come on guys....it was a fire and at end of day its better keepin generators off as instructed by a fire department. servers are safer off and isolated while fires and cabling is checked over

you would think an isp... 

Posted Monday 2nd June 2008 00:56 GMT

...Would have installed a firewall.

No 'coat' tag because I can't figure out which to click with the blackberry browser.

B3ta 

Posted Monday 2nd June 2008 02:07 GMT

Unhappy

Still offline.

You never know how much you miss something until it's gone. Hope they get everything back, and soon...

Cheers

Fire fighting and computer rooms 

Posted Monday 2nd June 2008 02:52 GMT

"I once had a tour of a data centre in the UK and was shocked to find their fire extinguisher system was "water sprinklers""

A dry-pipe water system, Vesda particulate detector, and continuous staffing are the usual approach.

The Vesda system sets off an alarm and a tech with a fire extinguisher goes hunting for smoke. This allows the usual sort of computer-based fire to be handled with little damage to surrounding servers (usually they just get the power dropped as the tech drops the rack's circuits prior to removing the smoking gear, taking it outside, then opening the box and applying the extinguisher).

The water system is for the last resort, usually from a fire in another part of the building reaching the computer room. It's not unreasonable for the insurance company to sacrifice the computer room if that saves the building -- anyway, they are paying for the damage to both so it's their call.

Gas got unpopular when CPUs got small, numerous and hot and computer rooms got very, very large. If you think through the consequences of a cooling gas hitting a modern hot CPU and the problems of venting released gas from a large space you'll see the problems.

Fixed powder-based systems aren't a good fit to computers. An aerosol-based system would be a better fit.

I would appreciate help with replication 

Posted Monday 2nd June 2008 04:07 GMT

Despite the double entendre in the title, that with which I need help is replication of a MySQL database over two servers at differing locations. I have read the MySQL manuals, but I would appreciate a pointer to a tutorial or a book which explains the procedures in more detail. Currently I am working with a 1 gigabyte DB, and I would like to mirror or replicate it so I don't lose everything next time a server-farm disappears...

B3333TA!!! 

Posted Monday 2nd June 2008 04:27 GMT

I think you guys are missing the point, it's not about colocation or backups or power or any of that shite. It's about b3ta. What if there's no backup to the b3ta archive? It'd be like the library of Alexandria over again. 5:30AM on Monday and still nothing. I'm not a religious man but here goes.. Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo.

Do we have an update on B3ta yet? 

Posted Monday 2nd June 2008 07:06 GMT

Unhappy

I'm in the office, and can't read the QOTW archive. I also have yet to see a magenta cock today. This is wrong for a Monday.

I do have to ask the question 

Posted Monday 2nd June 2008 07:38 GMT

Happy

Why have you bothered to include the link to b3ta when you know fully well it's unobtainable?

@AC: data centres 

Posted Monday 2nd June 2008 07:54 GMT

Alert

That data centre with water wouldn't happen to be in Edinburgh would it? I think I've been there too.

CyberIntelAIgent Alien Beings ....... PolderGeists in NetherLands 

Posted Monday 2nd June 2008 07:54 GMT

Coat

"Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo." .... By Seán Posted Monday 2nd June 2008 04:27 GMT

Amen and Hallelujah to That Passionate Restore Point of Immaculate Imperfect Relevance, Seán.

Love ur dDutch. .... Real Get SMARTer IntelAIgents.

Here's a Virtual IntelAIgents Swap Shop/Treasure Vault ....... http://www.ams-ix.net/

I'll get my coat ....there's a CAB AI Called.

Damnitall! 

Posted Monday 2nd June 2008 08:15 GMT

Joke

Work's blocking the Internet Archive's Wayback Machine; I can't even see if they've got older versions of B3TA squirrelled away anywhere!

This is ridiculous. Surely The Planet have backups, disaster recovery, that sort of thing?! Can you imagine if the Emergency Services said "Well we can't actually man the 999 phonelines 24/7/365. We'll need a week off every so often but we'll compensate anyone financially suffering from our unavailability"? Well this is even more serious! 9.15 and B3TA is still down, people!

The sig of the guy in the forum doing the updates... 

Posted Monday 2nd June 2008 08:23 GMT

Joke

... has a slightly unfortunate link title to it, "How fast is youre network?".

I don't know about mine, but parts of his network hit about 50mph recently!

Rasberry ants 

Posted Monday 2nd June 2008 08:32 GMT

Houston it's those damned insulation eating ants that did them in I bet.

Ahh the fools, the fools 

Posted Monday 2nd June 2008 08:44 GMT

Paris Hilton

I was 3/4 the way through coding a P2P message board that replicated the B3ta messageboard and would have mitiagted this, but then I gave up through lack of interest. http://sourceforge.net/projects/b3ta

Will they never listen, think of the children, apologies for length or lack of.

10:37 still no b3ta 

Posted Monday 2nd June 2008 09:38 GMT

Black Helicopters

Productivity across UK offices must be at an all time high for a Monday morning this must be some sort of xonspiracy. Helicopter for obvious reasons.

For those about to shock 

Posted Monday 2nd June 2008 09:45 GMT

Go

There's a temp board set up by Rob here

http://forum.robmanuel.com/viewforum.php?f=3

Remember off grid Rackshack? 

Posted Monday 2nd June 2008 09:49 GMT

IT Angle

An explosion of the local utility transformer took Rackshack's main DC off grid for 4 days a few years ago. Not a minute of downtime was experienced by 17,000 servers.

The subsequent write-up of the event showed both an amazing amount of pre-planning that initially kept everything going and fast adaption to cope with unexpected consequences to keep it going. A long list of lessons were learnt at Rackshack. Were these all passed on to The Planet when it acquired them?

And anyone who has a mission critical server without a geographical seperate backup - presumably doesn't understand the concept of backup - or why you have a minimum of two DNS. When those phones start ringing I hope they say "You are fired!". Putting client's businesses at risk (like no email?) is just darn unethical as well as bad business.

O look forward to hearing any excuses ... from £60/month for a deicated server phrases like a pennyworth of tar come to mind.

Shirley every host would have the same risk of this happening 

Posted Monday 2nd June 2008 10:32 GMT

It wouldn't take too much for a co-operative to be set up distributing activities between a predetermined number of other hosts until the crisis is over.

Incidentally would I have a legal case against b3ta for making actually have to do some worth on a Monday morning and the mental anguish caused by this?

only if it be the will of Allah 

Posted Monday 2nd June 2008 11:00 GMT

Flame

"Allah wu Akbar, Allah the digital, the compassionate please restore the purple cock and domo." .... By Seán Posted Monday 2nd June 2008 04:27 GMT

Ensha Allah.

Or, in layman's terms: the computers were built thanks to Allah, the data was put there by the hand of Allah, the colocation duplication systems were denied by the mighty will of Allah, the fire was started by the great and merciful Allah and the DNS servers are still down thanks to the estemed and bountiful Allah. Allah be praised - and the rest of us thank fuck it wasn't organised by LizardGov.uk otherwise the data centre would probably still only be half built, at half the original spec for quadruple the cost.

Can we go to stoning now?

Got to love status reports 

Posted Monday 2nd June 2008 11:09 GMT

Especially ones with "I would like to provide an update on where we stand following yesterday's explosion ..."

Data Cente Outage 

Posted Monday 2nd June 2008 11:20 GMT

Coat

Well I guess thats why we don't have our power transformers indoors then!

When they blow up they can do it peacefully in the car park while the UPS kicks in and prepares the generators for taking the load. When they kick in you see a mushroom cloud of diesel smoke, god knows what people think has happened when they see it!

I guess this is just a bad luck story, I can see they are working hard to repair this and get them back online. Would you like to be the one to reboot 9000 servers lol.

Definitely think they could have had a better disaster recovery plan in place. Seems like they only had a basic one and thats it....

The way to make it most resilient is to have two buildings kind of co-located (same business park etc) but not physically adjoined. So if one gets nuked the other can continue.

On the bright side I think they must have saved a bit on the leccy bill.... oops wheres me coat

Lunchtime and... 

Posted Monday 2nd June 2008 11:51 GMT

Unhappy

...still no b3ta. I may have to go out in the fresh air. Nice to see a few b3tans on this comments page here, though.

Unashamed plug for open source DR community 

Posted Monday 2nd June 2008 12:08 GMT

Boffin

If our El Reg moderatrix will permit it (pretty please, Sarah), may I invite Dr Trevor Marshall and other interested parties to join us for discussions at:

http://www.opensolaris.org/os/community/ha-clusters/

and/or

http://blogs.sun.com/SC/

and look particularly for entries related to the Geographic Edition.

"...the company blamed a faulty transformer for the fire." 

Posted Monday 2nd June 2008 12:17 GMT

Coat

So is this why the Heathrow security guy made that fella take his Transformers T-shirt off?

Lets hope 

Posted Monday 2nd June 2008 12:35 GMT

From the seriousness, that no one was hurt or injured. :)

b3ta 

Posted Monday 2nd June 2008 12:48 GMT

Unhappy

Productivity has increased tenfold. I hope b3ta/links is back on line soon. I have a rather humorous clip of Rick Astley to post . . .

That explains 

Posted Monday 2nd June 2008 12:57 GMT

Unhappy

why I spent the whole of Sunday furiously clicking 'refresh' on b3ta.com to no avail, failing to notice anything else around me, or remember to eat.

It's the /talk people I feel sorry for - at least the rest of us can look at the pretty pictures on 4chan.

(Peregrin)

Im at a loss 

Posted Monday 2nd June 2008 12:58 GMT

Alert

I've done far too much work today, and not enough skiving

Rob's emergency board isn't quite providing the same fix

lo, the greebo warrior

Network failure 

Posted Monday 2nd June 2008 13:22 GMT

Coat

I have nothing hosted with them, but it sounds like they were doing fine until the fire dept. forced them to shut down their generators. True, it's not as good as total redundancy, but again, it sounds like they could have coped. Probably the generators weren't even a slight part of the problem--just playing it safe.

Leet! 

Posted Monday 2nd June 2008 13:41 GMT

Coat

From the status page/forum post:

1337 User(s) are reading this topic

Thanks for the compliment, The Planet!

Mine's the one hanging on that wall over... oh, bugger.

@AC: Data Centre & @Glen Turner 

Posted Monday 2nd June 2008 14:11 GMT

Paris Hilton

The systems commonly used are of the HI-FOG mist fire suppression type. The pipe work and nozzles are often mistaken for 'sprinklers' but in fact discharge a very fine mist that puts out the fire and is safe for humans and the hardware. Very common for DC and Telecoms applications.

Gas discharge systems are expensive and the older CO2 fueled systems can be lethal to humans in areas where there's no ventilation.

Paris....'cos she enjoys a good sprinkling every now and again.

Watermelon 

Posted Monday 2nd June 2008 14:41 GMT

The emergency board is pants.

I need to see some badly shopped pictures of kittens damn soon - it's been more than 24-hours since I've seen a bandwagon, domo, teh quo or crudely drawn cocks.

My productivity is through the cranberry roof.

(linbox)

@Matt White 

Posted Monday 2nd June 2008 14:50 GMT

Thumb Up

Argh christ there's another one of me!

That said, I wholeheartedly agree with your b3ta & purple cock related statement dear clone.

and again 

Posted Monday 2nd June 2008 15:47 GMT

guys this is nothing to do with poor disaster recovery. the transformers taking the power from the grid have blown up. this damaged the lines in the building and the floors going to the racks. no matter how good your disaster plan is...it wont allow for this scale of event. they have to replace power cables, etc and the servers are offline until it is save to turn them back on. this is a serious failure of power...not simply generators not working or a fibre cable being cut

Has no-one thought to enquire 

Posted Monday 2nd June 2008 16:05 GMT

What the PFY was doing at the time? A transformer exploding and taking out three walls sounds deeply suspicious to me.

an entire day without magenta cocks 

Posted Monday 2nd June 2008 16:22 GMT

Unhappy

that's an entire day at work with no magenta cocks or TOAP image challenge entries.

I feel funny.

I do appear to have done some work, but tomorrow I will re-check all my data and almost certainly find critical errors.

Who should I sue?

@ Matt White 

Posted Monday 2nd June 2008 16:32 GMT

Flame

Who are you calling a clone? I'm the original!

@Richard 

Posted Monday 2nd June 2008 17:26 GMT

"nothing to do with poor disaster recovery"

Yes it is. Good DR requires that you have a backup installation sufficiently far from the primary site to withstand events like 9/11, New Orleans, Chernobyl etc.

If a few exploding transformers that take out some racks and cables put you off air, you do not have a valid DR plan. A UPS and generators might provide some measure of local high availability (HA), they don't cut it for DR.

And yes, DR costs more than HA. Just like insurance, you have to pay for adequate protection, or pay the price. There are a number of reports around which show that ~40% of businesses without a DR plan go bust after a disaster. The rest have a very painful few years.

@Brett Patterson 

Posted Monday 2nd June 2008 18:39 GMT

Linux

The nameservers for a particular domain really should seperated geographically and logically (network-wise). Getting a secondary nameserver is free or dirt cheap.

I sometimes hear people say "it doesn't matter much anymore". This is rubbish. Having all of your nameservers down is much worse than just having a service like your website offline. With all of the nameservers down mail to the domain won't queue, it will bounce and people visiting the website will see a message akin to "This domain doesn't exist". Non-technical users might be excused for thinking a company had gone out of business.

Run multiple namservers in different parts of the world. It's cheap, easy and saves a lot of hassles.

@ all you B3tards 

Posted Monday 2nd June 2008 19:44 GMT

Black Helicopters

It's perfectly obvious that this fire story is all a cover-up: what's REALLY going on is that the governments of the English-speaking world, having awoken to the very real dangers of the impending recession, have struck pre-emptively and taken steps to increase office productivity by shutting down all known havens of timewasting. Mark my words, icanhascheezburger is next

@Steve 

Posted Monday 2nd June 2008 21:18 GMT

Happy

"Good DR requires that you have a backup installation sufficiently far from the primary site to withstand events like 9/11, New Orleans, Chernobyl etc."

DR does not mean uninterrupted operation, it means a plan to get back in business within an acceptable amount of time. You have to be realistic and match your DR plans to the level of service you are offering otherwise you will be out of the highly competitive lower/mid end hosting biz very quickly.

This is a host with 50K servers, they lost 9K to this event. I believe they have 6 data centers, AFAIK they are all in the Dallas area of Texas taking advantage of the low power costs there. Following your logic they should have one or more data centers sitting idle in another state just in case of a catastrophic event such as this one. There's no way they could do that unless they were selling a much higher grade of service.

They are recovering from their disaster. Last time I checked something like 2/3 of the servers are back up, or in the process of getting back up (and that's in less than 48 hours) and there's a plan in process to temporarily get power to the rest that were directly affected by the explosion.

I have a website that's hosted at one of their other locations, I am critical of their design that put management servers for my location in the center that suffered the fire, causing unnecessary disruption of service that would not have happened if the centers were independent.

My 1st floor server is back on-line. 

Posted Tuesday 3rd June 2008 00:24 GMT

Happy

It came back at 8.32pm BST. Apparently there are not too many more to go now before all the servers are back. Looks like B3ta will be left until last. LOL.

The second floor is running on mains power again, but due to damage to the underfloor power conduits the first floor is all on generator power and will be for the next 10-12 days. Ouch! Hope they've bought plenty of diesel and a mechanic.

@Dick 

Posted Tuesday 3rd June 2008 08:19 GMT

I agree that appropriate DR needs to be matched to the service they are selling. If their customers are happy with an SLA that allows a 48+ hour outage then that's fine. The people I deal with will get upset (putting it mildly) over a 2-hour outage.

There's no need to have an idle data centre elsewhere, though. It could be doing useful work with some spare capacity ready to pick up the load from a site that fails, giving reduced service rather than a full outage. As with all HA/BCDR solutions its a tradeoff of cost versus RPO/RTO matched to the service agreement that you're selling to your customers. The likes of the Nasdaq or the NYSE will have very different DR requirements to a small company that will be only mildly inconvenienced by a two-day outage.

Personally I wouldn't trust my business to a company with all its data centres in one city, though. There are way too many possible common-mode failures there.

Time to skive! 

Posted Tuesday 3rd June 2008 13:56 GMT

Happy

You can get to B3ta through http://207.44.242.20 so server's definitely back online. Seems a bit slow, though!

b3ta is back 

Posted Tuesday 3rd June 2008 15:13 GMT

Happy

http://207.44.242.20

Err WOO YAY! . . . .. ?

Like a Parisian courtesan 

Posted Wednesday 4th June 2008 06:34 GMT

Firefox can't establish a connection to the server at www.b3ta.com.

Firefox can't establish a connection to the server at 207.44.242.20.

Balls.

Thank Dog for that ... 

Posted Wednesday 4th June 2008 07:14 GMT

Thumb Up

Yes I'm was hit by the H1P1 debarcle .... just happy enough to be back now !

Nightmare is finally over for me... 

Posted Wednesday 4th June 2008 13:06 GMT

I was also hit by the H1P1 debacle... all my servers affected. Half of them in second floor, recovered on monday, and the rest in first one, just recovered some hours ago. Kurt.-

Don’t Miss

QualcommQualcomm proffers first smartbook platform

Smartphone spliced with netbook, see

MicrosoftSuppliers fall over themselves to support Exchange 2010

New species spreads to four new environments

Logitech_logo_SMMouse maker spends big on video conferencing

Eeeek... how much?

NetListNetlist goes virtual and dense with server memory

So much for that Cisco UCS memory advantage