The Register® — Biting the hand that feeds IT

Feeds

Bitbucket's Amazon DDoS - what went wrong

A cautionary cloud tale

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

After a DDoS brought down Bitbucket's web-based code-hosting service for more than 19 hours over the weekend, Jesper Nøhr speculated the attack had exposed a flaw in the sky-high Amazon infrastructure that hosts the site. Nøhr - who runs Bitbucket - has since spoken to an "Amazon executive" about the attack, and according to his account of the conversation, his earlier speculation was right on the money.

Bitbucket runs its entire site on Amazon's Elastic Compute Cloud (EC2), which provides scalable processing resources, and it uses Amazon's Elastic Block Store (EBS) to store its database, log files, user data, and more. EBS provides persistent storage for EC2 server instances. The problem, according to Jesper Nøhr, is that the storage system operates on a network channel that's exposed to the outside internet.

Bitbucket's Amazon setup worked well enough until late last Friday, when Nøhr realized EBS was "virtually unavailable." The outage persisted for more than 16 hours, in part because both Nøhr and Amazon's support reps assumed there was some sort of problem with EBS. According to Nøhr, the first rep he spoke to attributed the slowdown to the fact that EBS is a shared resource used by other bandwidth-hungry Amazon customers. In a statement sent to The Reg, Amazon gives a similar story.

"Over the weekend, one of our customers reported a problem with their Amazon Elastic Block Store (EBS)," the statement reads. "This issue was limited to this customer’s single Amazon EBS volume and other customers were not affected. We did not immediately look beyond the reported problem and spent too much time focusing on what was believed to be an issue with the Amazon EBS volume."

But as it turns out, Bitbucket's Amazonian infrastructure had been DDoSed. "We were attacked. Bigtime. We had a massive flood of UDP [User Datagram Protocol] packets coming in to our IP, basically eating away all bandwidth to the box," Nøhr wrote on his blog. "So, basically a massive-scale DDOS. That’s nice."

Once the cause of the problem was determined - more than 16 hours after the attack started - Amazon blocked the offending traffic, and things were soon back to normal. But Nøhr - and so many other netizens who followed the story - couldn't understand why a DDoS attack tied up what should have been "internal" storage resources.

Nøhr guessed that Bitbucket's storage sits on the same network interface that connects the site to the outside world, and according to Nøhr, this has been confirmed by Amazon. "We were speculating whether all the traffic was on the same interface, and [the Amazon EC2 executive] told us this was true," Nøhr told The Reg.

According to Nøhr, Amazon also told him that the company Quality of Service technology - meant to prioritize the storage traffic - did not work as the company expected. "They said they were supposed to prioritize EBS traffic over other traffic so we wouldn't be bogged down by external traffic," Nøhr says. "But they admitted it wasn't working the way they wanted it to."

Amazon has not responded to a request for comment on this specific issue. But an earlier statement from the company doesn't contradict what Nøhr has said.

"What we ultimately found was not a problem with Amazon EBS, but rather that the customer’s Amazon EC2 instance was receiving a very large amount of network traffic," the statement reads. "This large flood of traffic overwhelmed the networking of the customer’s single Amazon EC2 instance and caused performance to degrade on all I/O operations on the instance. Once we properly diagnosed the problem, we worked with the customer to put measures in place to help mitigate the unwanted traffic they were receiving."

Like many, Scott Morrison - chief architect and VP of engineering at Layer 7, a company that offers an outside security solution for Amazon's so-called cloud - finds it rather hard to believe that Amazon would put EBS on an outside net connection. "It seems like [EBS] shouldn't be externally accessible," he tells The Reg. "It's bizarre. That's sort of like making NFS mounts accessible outside your firewall - something you would never do."

The other problem with Amazon's setup, according to Jesper Nøhr, is that customers like him have no way of viewing the DDoS traffic hitting their sites - i.e. they have no way of identifying an attack. What's more, he says, Amazon told him that even the "Gold" support reps he initially spoke to didn't have a way of viewing the traffic.

"[Amazon] said that there is a department at Amazon that monitors such traffic, but [Amazon] said the first line of support can't see it," Nøhr says. "In short, you can't really see into the problem, because Amazon's Web Services is kind of a black box."

None too surprisingly, Layer 7's Scott Morrison calls this "a huge problem." Again, Amazon did not respond to a request for comment on this particular issue.

On Friday, Nøhr payed $400 to get access to Gold support. And to Amazon's credit, it has told Nøhr it will refund the money. And though he questions Amazon's setup, he feels that the company ultimately responded quite well to the problem. "Amazon has been very transparent with us and very apologetic. I don't want their name to be dragged through the mud."

Amazon does tell The Reg that such an attack may have been avoided if Bitbucket had been using additional Amazon services, such as the recently announced Elastic Load Balancing and Auto-Scaling. And Nøhr says the company told him much the same.

Nøhr says the company also told him that in the future, it would provide additional information about web traffic to customers and support personnel in an effort to better identify such attacks.

Nonetheless, says Layer 7's Scott Morrison, all this should serve as a cautionary tale for those eyeing the, um, cloud. "This is exactly what people have been warning about in the cloud for a while," he says. "Sure enough, here is the perfect example." ®

Agentless Backup is Not a Myth

Latest Comments

Sad but true

Once the incoming network was saturated, probably nothing would have stopped the attack. I talked to some of our CTOs & architects and put together a summary - http://cloudsecurity.trendmicro.com/ddos-and-the-cloud-sad-but-true/# .

TT

0
0

Using QoS to prioritise traffic?

Are they really that naive at Amazon Central? Prioritisation only really works when all the packets going to and fro are well behaved. Any half aware script kiddie worth his salt would know that and use it to his advantage.

0
0

not so interesting, actually

I'm not sure you actually hit the real source of the problem. It seems to me that there is a different true problem, which is that the EC2 security group rules (firewall) are implemented on the host, not on an external device. I assume Bitbucket's rules denied all those UDP packets, but they still hit the host and thus caused network contention. The EBS issue is secondary in that it's traffic should have had priority over the UDP packets. But the real problem was that nobody could see the UDP traffic and respond appropriately.

0
0

More from The Register

 breaking news
NSA PRISM snoop-gate: Won't someone think of the children, wails Apple
10,000 things probed, mostly about missing kids, Alzheimer patients, we're told
 breaking news
NSA PRISM-gate: Relax, GCHQ spooks 'keep us safe', says Cameron
Whatever they are up to, it's all above board, we're told
PRISM snitch claims NSA hacked Chinese targets since 2009
Snowden suddenly looks safer in Hong Kong after revelations
 breaking news
US chief spook: Look, we only want to spy on 6.66 BEELLLION of you
Americans assured they are not in the NSA's sights
Speech-to-text drives motorists to distraction
Will talking to you mean I crash into that car up ahead, Siri?
DHS warns of vulns in hospital medical equipment
Has your doctor's anasthesia machine been hacked?
 breaking news
'BadNews is malware' says outfit that found it
Google says code harmless but Lookout says code base is evolving
Panda-peddlers cuffed for chess gambling gambit
More porridge on the menu for Chinese coders after second offence
 breaking news
Yes, maybe we should keep hackers in the clink for YEARS, mulls EU
Watch out black hats, they just might throw away the key
Internet fraud still stings suckers
Australians twice as gullible as Americans