Boffins supercharge the 'hosts' file to save users plagued by DNS outages
Chinese Academy of Sciences thinks it has a way to give DNS a backup
The venerable Domain Name System (DNS) is becoming known for fragility, and keeping track of your own favourite sites' IP addresses is a pain. So a group of researchers want to automate the upkeep of hosts to give users an emergency backup if their provider blacks out.
The idea is that DNS records could be double-checked against a set of binary tuples of IP address and domain name, held outside the DNS infrastructure, as a kind of sanity check against something going wrong.
The Chinese proposal at arXiv would be disappointing as a replacement for the DNS, since its peak “recall” rate is just over 90 percent, but if there was a critical outage in a corner of the system, it means users would have a better than 90 percent chance of being able to find Websites while they waited for the system to stabilise again.
Since El Reg has reported on more than one major DNS outage per month in 2017 alone, plus some malicious attacks, some kind of backup is probably useful (whether or not this proposal is it).
So what have Caiyun Huang of the Chinese Academy of Sciences and collaborators (from the academy and from the country's CERT) suggested?
They offer their “Self-Feedback Correction System for DNS (SFCSD)” as a souped-up hosts file: instead of a simple manually-maintained table, they've written software to track SSL, DNS and HTTP traffic, with filtering by CDN CNAME and “non-homepage URL feature strings”, and Web page fingerprinting.
The end result is still recorded in the hosts file – tuples of IP address and domain names.
The group reckons SFCSD can hit “94.3 percent precision and 93.07 percent recall rate”, can process at as much as 8 Gbps, and running standalone can recall as many as 1,000 tuples per day and provide correction for 200 domains.
Here's the system schematic from the paper:
How the Chinese hosts file automation system works
To create the fingerprints, SFCSD uses Google's SimHash, which creates a 64-bit hash of page text designed to compute the similarity between two text blocks (in this case, home pages).
The system tests what it puts into the hosts file, and asks for user intervention only if it can't verify the IP/domain pair.
As you might expect, the system's fingerprinting breaks on sites that use HTTPS, but the regex matching and CDN CNAME filtering is unaffected. ®