This major internet routing blunder took A WEEK to fix. Why so long? It was IPv6 – and no one really noticed

When you meant to type /127 but entered /12 instead

Color me confused

Comment Last week, an internet routing screw-up propagated by Verizon for three hours sparked havoc online, leading to significant press attention and industry calls for greater network security.

A few weeks before that, another packet routing blunder, this time pushed by China Telecom, lasted two hours, caused significant disruption in Europe and prompted some to wonder whether Beijing's spies were abusing the internet's trust-based structure to carry out surveillance.

In both cases, internet engineers were shocked at how long it took to fix traffic routing errors that normally only last minutes or even seconds. Well, that was nothing compared to what happened this week.

Cloudflare's director of network engineering Jerome Fleury has revealed that the routing for a big block of IP addresses was wrongly announced for an ENTIRE WEEK and, just as amazingly, the company that caused it didn't notice until the major blunder was pointed out by another engineer at Cloudflare. (This cock-up is completely separate to today's Cloudflare outage.)

How is it even possible for network routes to remain completely wrong for several days? Because, folks, it was on IPv6.

"So Airtel AS9498 announced the entire IPv6 block 2400::/12 for a week and no-one notices until Tom Strickx finds out and they confirm it was a typo of /127," Fleury tweeted over the weekend, complete with graphic showing the massive routing error.

That /12 represents 83 decillion IP addresses, or four quadrillion /64 networks. The /127 would be 2. Just 2 IP addresses. Slight difference. And while this demonstrates the expansiveness of IPv6's address space, and perhaps even its robustness seeing as nothing seems to have actually broken during the routing screw-up, it also hints at just how sparse IPv6 is right now.

To be fair to Airtel, it often takes someone else to notice a network route error – typically caused by simple typos like failing to add a "7" – because the organization that messes up the tables tends not to see or feel the impact directly.

But if ever there was a symbol of how miserably the transition from IPv4 to IPv6 is going, it's in the fact that a fat IPv6 routing error went completely unnoticed for a week while an IPv4 error will usually result in phone calls, emails, and outcry on social media within minutes.

And sure, IPv4 space is much, much more dense than IPv6 so obviously people will spot errors much faster. But no one at all noticed the advertisement of a /12 for days? That may not bode well for the future, even though, yes, this particular /127 typo had no direct impact.

Everyday experience

Y'know what? Maybe it was noticed, and people have grown so used to IPv6 being a little unreliable thanks to countless fudges and fixes that engineers keep imposing on the existing system – instead of shifting to IPv6 properly – that it didn’t seem too out of the ordinary.

Perhaps it went unnoticed because automated systems ignored it in preference of more specific, working, routes, and nothing at all raised any alarms.

Big bill

Strewth! Aussie ISP gets eye-watering IPv4 bill, shifts to IPv6 addresses

READ MORE

There are now quite a few different sources on how IPv6 adoption is going: the Internet Society has compiled most of the good ones in a single place. But while internet organizations continue to insist that things are going well, with, say the Americas offering 31 per cent IPv6 capability, it may be time to start digging into the stats that really matter: actual usage.

Google currently claims that 28 per cent of its visitors are using IPv6. We don't buy it. More likely that it's 28 per cent of connections, rather than actual users. And we wonder how much of that is automated traffic that comes from Google's own systems.

Just as routing errors have drawn attention to the fact that the internet is too strongly reliant on trust and is often held together by string and willpower, this error reveals that IPv6, more than 20 years after its inception, is still dangerously lagging in actual adoption.

And considering an entire block went AWOL, it only strengthens the argument that every internet provider and infrastructure organization needs to get on board with the Mutually Agreed Norms for Routing Security (MANRS), add filtering and anti-spoofing, and do more coordination and validation. ®

Sponsored: Technical Overview: Exasol Peek Under the Hood




Biting the hand that feeds IT © 1998–2019