The Register® — Biting the hand that feeds IT

Feeds

Router crash downs CloudFlare services

A lesson in disclosure

Ensure Ease of Recovery with Asigra’s Agentless Software

During Sunday, US time, prominent Web services outfit CloudFlare sent an instruction to its routers in response to an attempted DoS, and instead took down its own network.

In a rare example of detailed disclosure, the company has posted an explanation of what happened here.

The network collapse occurred, the company explains, after it detected an attempted denial-of-service attack against a customer’s DNS servers using packets that were between 99,971 and 99,985 bytes long – an oddity, CloudFlare notes, because that’s so much larger than the Internet’s typical packet length (500 – 600 bytes according to the company) and larger than the 4,470 byte maximum packet it allows on its internal network.

So it wrote a JunOS rule (CloudFlare is a Juniper shop) to drop the packets, propagated the rule to its routers – and for reasons unknown, that rule crashed all the routers at which the instruction arrived.

“Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed,” the blog post notes.

The crashes happened in such a way, CloudFlare says, that the routers didn’t reboot automatically, which meant that they couldn’t be accessed remotely; and worse, those routers that did wake back up copped the entire traffic load, couldn’t cope, and crashed again.

Accounts covered by SLAs will get credits, the company says, and it is investigating the problem with Juniper. ®

Cloud based data management

I think someone meant to do that.

Chris, I'll make you a bet, the packets weren't really "between 99,971 and 99,985 bytes long", they just had header fields saying they were, they sort of say as much when they say no packet should have matched the rule because no packets were actually that long, and that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly.

4
0

Re: Bill Re: test?

".......maybe they didn’t have time to go through full testing?......" I've seen similar mistakes, usually they are a combination of management pressure - "fix that NOW" - and over-confidence in one's own ability. Many, many moons ago, there was a rumour of a ping of death for CISCO Catalyst routers (5000 models IIRC) and much argument amongst netties as to whether it would work or not. At company I was working for at the time, our network architect, having the authority to do as he pleased, was firmly in the "it-won't-work" camp and decided to test it against one of our routers, only to find not only did it work but it also propagated through all the same models in the network. Cue embarrassing and company-wide network outage which we definitely did not step up and explain to the customers!

2
0

Re: test?

Except this company sell a CDN product that is supposed to relieve stress on servers when they are under DoS and provide (and I quote) "Always Online™" and "Rock solid reliability" so that even if your server goes down, your visitors can still see your content.

So it's a bit embarrassing to not test, to just roll out, and not have an adequate testing procedure (I mean, rolling it out to all your routers before you notice is a bit stupid, no matter what).

And I can attest that at least one site I'm aware of was down for quite a long time despite the fact that it uses CloudFlare CDN to keep itself online "no matter what" and was returning all sorts of errors even though the underlying origin servers were up. Next time, their accountants will be telling them to test before they deploy, I think.

2
0

More from The Register

 breaking news
UK telcos chuck another £1m at online child abuse watchdog
Web enforcers IWF gain power to seek and destroy illegal content
 breaking news
Pttow! Ofcom kicks hams out of MoD bands
Geet off my land, you, you ... 'secondary user'
 breaking news
Now you can use your phone instead of your wallet at the ATM, too
Blimey, these little paper towels out of the vending machine are really expensive
 breaking news
UK.gov's £530m bumpkin broadband rollout: 'Train crash waiting to happen'
Whitehall whispers of damning watchdog report next month
Google launches broadband balloons, radio astronomy frets
A careless Loon could blind the square kilometre array
 breaking news
MySpace zaps millions of teens' tearful rants, causes wave of angst
'Your crappy redesign SUCKS, I wanna read my blogs' screech users
 breaking news
Microsoft Office 365 on iPhone NOW: No, we're not making this up
Word, Excel, Powerpoint for your pocket-stroker
 breaking news
EU signs off on eCall emergency-phone-in-every-car plan
GPS and a mobe in every car - do you suppose the NSA would fancy that?