Code crash? Russian hackers? Nope. Good ol' broken fiber cables borked Google Cloud's networking today
Connectivity to us-east1 knackered for hours, still no fix
Fiber-optic cables linking Google Cloud servers in its us-east1 region physically broke today, slowing down or effectively cutting off connectivity with the outside world.
For at least the past nine hours, and counting, netizens and applications have struggled to connect to systems and services hosted in the region, located on America's East Coast. Developers and system admins have been forced to migrate workloads to other regions, or redirect traffic, in order to keep apps and websites ticking over amid mitigations deployed by the Silicon Valley giant.
Starting at 0755 PDT (1455 UTC) today, according to Google, the search giant "experiencing external connectivity loss for all us-east1 zones and traffic between us-east1, and other regions has approximately 10% loss."
I got 502 problems, and Cloudflare sure is one: Outage interrupts your El Reg-reading pleasure for almost half an hourREAD MORE
By 0900 PDT, Google revealed the extent of the blunder: its cloud platform had "lost multiple independent fiber links within us-east1 zone." The fiber provider, we're told, "has been notified and are currently investigating the issue. In order to restore service, we have reduced our network usage and prioritised customer workloads."
By that, we understand, Google means it redirected traffic destined for its Google.com services hosted in the data center region, to other locations, allowing the remaining connectivity to carry customer packets.
By midday, Pacific Time, Google updated its status pages to note: "Mitigation work is currently underway by our engineering team to address the issue with Google Cloud Networking and Load Balancing in us-east1. The rate of errors is decreasing, however some users may still notice elevated latency."
However, at time of writing, the physically damaged cabling is not yet fully repaired, and US-east1 networking is thus still knackered. In fact, repairs could take as much as 24 hours to complete. The latest update, posted 1600 PDT, reads as follows:
The disruptions with Google Cloud Networking and Load Balancing have been root caused to physical damage to multiple concurrent fiber bundles serving network paths in us-east1, and we expect a full resolution within the next 24 hours.
In the meantime, we are electively rerouting traffic to ensure that customers' services will continue to operate reliably until the affected fiber paths are repaired. Some customers may observe elevated latency during this period.
Customers using Google Cloud's Load Balancing service will automatically fall over to other regions, if configured, minimizing impact on their workloads, it is claimed. They can also migrate to, say US-east4, though they may have to rejig their code and scripts to reference the new region.
The Register asked Google for more details about the damaged fiber, such as how it happened. A spokesperson told us exactly what was already on the aforequoted status pages.
Meanwhile, a Google Cloud subscriber wrote a little ditty about the outage to the tune of Pink Floyd's Another Brick in the Wall. It starts: "We don't need no cloud computing..." ®