US states join watchdog probing CenturyLink's Xmas data center outage that screwed 911 system
TITSUP network card fingered for dropped calls (that's a Total Inability To Send Usable Packets)
Wyoming is the latest US state to formally probe CenturyLink's network outage, which black-holed 911 calls over Christmas.
America's comms watchdog the FCC, and regulators in Washington state, are also investigating the blunder – asking exactly how it happened, and why it took so long to resolve – along with Wyoming's Public Service Commission, which joined the fray this week.
On Thursday, December 27, CenturyLink's external cloud network fell over, knackering its data centers in California, New York, Virginia, the UK, Singapore, and elsewhere. The upshot: unlucky subscribers' business and home broadband, TV, and phone services lost connectivity. Private Ethernet lines were also knocked offline among other services. The outage also apparently took down a few cash machines.
Crucially, a number of 911 call centers across the US, from East to West Coast, relying on CenturyLink for network connections were hit, and couldn't take emergency calls from folks. From Massachusetts to Colorado to Washington state, 911 dispatchers in various centers reported they were unreachable via the emergency number, or that their phone or internet access was patchy.
Police and fire departments took to Twitter, Facebook, and the mainstream media to tell people to phone for help directly using 10-digit numbers. At one point, CenturyLink staff suggested citizens drive to their nearest fire station, cop shop, or hospital, if there was an emergency.
FCC taps CenturyLink on shoulder, mumbles about a fine for THAT six-hour 911 outageREAD MORE
By Saturday, the outage was over: CenturyLink had pinpointed and fixed a rogue "network element" causing the disruption. According to a memo to customers, that element turned out to be a single broken network management card in Colorado that was firing badly formed frames into its network. This invalid traffic cascaded into a much larger cockup by confusing or taking out connecting systems.
The card was replaced to end the outage, and packet filters were installed to hopefully prevent it from happening again. During that work, techies in the field had to locally log into equipment to apply changes, or replace or reset line cards.
Initially, network admins in New Orleans spotted that services were wobbly, and engineers scoured the ISP's network map and logs for the culprit. Secondary communication channels were also disabled in case they were at fault, and were switched back on when the real problem was identified. Ultimately, techies in Georgia, Illinois, Missouri, and beyond, were led to the dodgy management card in Denver, which was replaced.
"CenturyLink knows how important connectivity is to our customers, so we view any disruption as a serious matter and sincerely apologize for any inconvenience that resulted," the ISP said in a statement on its status page. We also note that the broken card has been sent to its manufacturer for close inspection.
FCC boss Ajit Pai didn't hold back last week, condemning the outage and launching an inquiry into the downtime. "The CenturyLink service outage is ... completely unacceptable, and its breadth and duration are particularly troubling," thundered Pai.
"I’ve directed the Public Safety and Homeland Security Bureau to immediately launch an investigation into the cause and impact of this outage."
Just a couple of weeks before the dodgy network card caused so much upset, Lisa Fowlkes, the watchdog's public safety and Homeland Security bureau chief blogged, "There is no higher priority at the FCC than promoting reliable 911 service." Ooof.
In 2015, CenturyLink was fined $16m for a six-hour 911 outage. We wonder what the going rate for 48 hours is. Double ooof. ®