IT Pro confession: How I helped in the BIGGEST DDoS OF ALL TIME

Original URL: https://www.theregister.com/2013/03/28/i_accidentally_the_internet/

Oh Trevor, how could you? Like this

Posted in On-Prem, 28th March 2013 14:24 GMT

Sysadmin blog I contributed to the massive DDoS attack against Spamhaus. What flowed through my network wasn't huge - it averaged 500Kbit/sec – but it contributed. This occurred because I made a simple configuration error when setting up a DNS server; it's fixed now, so let's do an autopsy.

The problem

I should start off by apologizing to CloudFlare and Spamhaus; my lapse contributed to a DDoS against their infrastructure. More damning than merely having been an unwitting participant is that I knew enough about this sort of attack to have set up rudimentary protections against it and yet I still forgot the critical component: actually disabling recursive lookups.

The way a DNS amplification attack works is simple. DNS servers can be configured in one of two basic ways. In one possible configuration a DNS server serves only domains for which it is responsible (authoritative). In the other configuration the DNS server serve those domains and goes looking on the wider internet for any domains it isn't personally set up to manage (recursive).

Your DNS server at the office might be configured to be authoritative for localdomain.yourcompany.com. If you want to go to www.google.com then in a strictly authoritative configuration your DNS server won't be able to provide an answer: it doesn't know where www.google.com is.

In a recursive configuration the DNS server asks the list of root servers it has preconfigured who owns the ".com" domain. It then asks the .com servers who owns "google.com". It then asks "google.com" who owns "www.google.com" and delivers that address back to you.

As you can see, recursive DNS servers are what allow the internet to work. They are also an attack vector. Let's say that you leave your recursive server open to the internet. Now not only can you ask your DNS server for information about other DNS servers on the internet, so can anyone else. If someone asks your server "where is www.google.com" a whole bunch of times then your server starts flooding google.com's DNS servers.

For every 1 byte of data sent to your DNS server 50 bytes of traffic end up directed at the target. This is a DNS amplification attack. The issue has been around for ages, but it has taken this latest over-the-top attack to get most DNS administrators to sit up and notice. Things aren't great right now for Cloudflare or Spamhaus, but open recursive servers are finally starting to close. What follows is nerd detail on the problem.

My server

The server in question is what I term an "edge scrubber." The system itself is nothing particularly special. It is an Intel Atom D510 with 2GB of RAM, 2 1Gbit Intel NICs running CentOS 5.9 with a real time kernel.

The scrubber sits on the very edge of my network and does what most Cisco networking folks would use a router for. It has an external IP address given to me by my ISP and a gateway on the ISP's network.. I have a /27 block of IP addresses assigned me by my ISP and this device hands those out to servers and routers within the datacenter.

The scrubber also serves various "scut work" functions on behalf of all the other devices on the network. It is the datacenter-local network time server, external DNS server, IDS, edge firewall and bandwidth limiter. Every single packet in or out of the datacenter passes through this box; it handles 30Mbit symmetrical with some heavy deep packet inspection just fine. 45 Mbit seems to be the maximum it will do reliably.

The bad DNS configuration

The DNS server in question was one I had gotten halfway through setting up to "scrub" bad addresses. I hadn't tested the setup as it had been copied directly from a testlab system but did not have any production servers yet pointed to it.

The goal is to use the malwaredomains DNS blackhole list to make sure that nothing in the datacenter can access the worst of the known "bad guy" websites out there. Cutting off DNS access to these sites effectively neuters a whole host of potential browser-based attacks and even helps thwart any botnets that might take hold. The config consists of three files:

Nuts and bolts time

/etc/namedb/blockeddomain.hosts
This file redirects any attempts to contact "bad" domains to a honeypot server I maintain. Here users will get a website warning informing them that something nasty could have happened and I trap information on who, when, when, where and why.

; This zone will redirect all requests back to the blackhole itself.
$TTL 86400; one day
@ IN SOA blocked.mydomain.com. blocked.mydomain.com. (
1
28800 ; refresh 8 hours
7200 ; retry 2 hours
864000 ; expire 10 days
86400 ) ; min ttl 1 day
NS blocked.technicare.com.
A [IP OF MONITORING SERVER]
* IN A [IP OF MONITORING SERVER]

/etc/namedb/update_nameservers
This is a script that I run every night at midnight. It checks to see if the malwaredomains list is newer than the one I already have and then downloads it. It then restarts BIND.

cd /etc/namedb
wget - N http://mirror1.malwaredomains.com/files/spywaredomains.zones
/etc/init.d/named restart

/etc/named.conf
This is my DNS configuration file.

options {
directory "/etc";
pid-file "/var/run/named/named.pid";
check-names master ignore;
check-names slave ignore;
};

zone "." {
type hint;
file "/etc/db.cache";
};

include "/etc/namedb/spywaredomains.zones";

The solution

I knew about recursion attacks; I even went so far as to set up countermeasures. Should DNS traffic for any reason exceed 1Mbit then the scrubber server was to e-mail me at once and lock all DNS traffic down to 500Kbit. The alarm went off late Tuesday night reporting DNS traffic of 10Mbit.

My mistake stems from the simple assumption that BIND disables recursion by default. The change was made with BIND 9.4 way back in 2007. For reasons incomprehensible to me CentOS 5.9 (which my edge scrubber is currently running) is running BIND 9.3.6 which means that by default recursion requests are honoured.

The fix required is simple; after check-names slave ignore; but before }; I needed to insert allow-recursion { [MY SUBNET]/27; };. This instructs BIND to only honour recursion requests from servers inside my datacenter. Using allow-recursion (127.0.0.1) would limit it to only that server. That's all there is to fixing that issue![1]

One little number

I have been working with CentOS 6 in my lab. Every new VM, every hardware install, everything has been CentOS 6 for so long that I forgot I even still have 5.x units in the field. I had gotten used to BIND whose version was somewhere north of 9.8. The edge scrubber, however, has been in place and doing yeoman's work since long before CentOS 6 came out.

I made an assumption during an application configuration that led to one of my servers being used as part of the largest denial of service attack the internet has ever seen. A service I rely upon – Spamhaus – was inconvenienced due to my negligence. I am incredibly, incredibly sorry; I hope that helping others avoid the same mistake will begin to atone for this administrative misdeed.

[1] The keen eye will notice two other flaws in my server design. The first is that BIND isn't chrooted. This is because the spywaredomains.zones file from malwaredomains isn't really designed with RedHat-based operating distros in mind. If you were to chroot bind you'd have to post-process the zone file to cope with the path differences. Since I'm not doing shared virtual hosting and use fail2ban. I figure I can probably get away without it.

The second is that DNSSEC isn't enabled. I deserve 50 lashes with a wet noodle for that; but I've been lazy and putting off the upgrade to CentOS 6 on this system which would enable that by default. ®