Facebook blames outage on internal config flaw
Cascading failure feedback loop calamity
Facebook has published a detailed explanation of an internal configuration flaw that left the site unavailable for around two and a half hours overnight - the social network's worst downtime in four years.
The outage stemmed from a cascading series of problems involving an error correction system that feed into a feedback loop that only cutting traffic to a database cluster and rebooting the site could solve.
The social network apologised for the downtime, which affected servers worldwide, and promised to redesign the faulty system it used to correct configuration values to prevent future problems in the area. In the meantime, this system has been taken out of commission.
Thursday night's outage follows similar but less severe problems the day before. ®
It makes me cross
Each time Facebook publishes any kind of technical information, their post is plagued by hundreds of people commenting and claiming to know how to do it better or fix it. And generally they're talking absolute crap.
Facebook may not be perfect but they know what they're doing.
I'm not sure why it makes me so cross - but it does. It really does!
get a life ?
I laugh at all the people telling facebook users to "get a life" etc.
Ha. Ha. Ha.
I got exactly the same sort of glib condescending shyte from people when I used Cix for the first time, fidonet or indeed email/www.
I am not particuarly gifted in the ways of mult datacentre server management but the ultimate solution appeared to be to turn it off and on again.
Outsourced to Renholm Industries?
.Facebook uses Akamai for the static files, such as photos, images, etc.
They don't use it for the main www site normally.
However, yesterday, during the outage, they changed the DNS entry for www.facebook.com to point to:
root@northway# host www.facebook.com.
www.facebook.com is an alias for sorry.ak.facebook.com.edgesuite.net.
sorry.ak.facebook.com.edgesuite.net is an alias for a1030.g.akamai.net.
a1030.g.akamai.net has address 188.8.131.52
a1030.g.akamai.net has address 184.108.40.206
As they said, they needed to stop all traffic to fix the problem, so temporarily diverting to their network of akamai servers seemed to be way they chose to do it
Superman was spinning it during the downtime.