Rotten routers caused Intermedia service crash, says CEO
Wanna qualify for SLA credits, got 72 hours
The service blackout at third-party Microsoft Exchange hosting biz Intermedia was caused by glitches in core routing kit, the CEO has confirmed.
As revealed by El Reg on Tuesday, Intermedia went down in the UK early afternoon and remained down for some hours, with customers forced to vent their spleen on Twitter because they could not contact the company.
Phil Koen, CEO at Intermedia, sent us some comments - which were posted on his blog - saying that normal service delivery has resumed, "there has been no data loss and there were no security breaches".
He revealed that on 28 August it encountered an "issue with our core routers" and it "implemented a fix" that rectified the fault a day later, or so Intermedia thought.
But by yesterday morning, "further anomalies" were observed said Koen:
"Attempts at intervention were unsuccessful, and corruption in our core routing table progressed to the edge routers in all our US data centres".
"This created significant packet loss between the edge and core in each data centre and it prevented delivery of service to our customers. In addition, as our communication systems reside in the same data centres, our ability to communicate with customers and partners was disrupted".
Customers on both sides of the pond - folk in the UK claimed to be affected by the outage too - were united in the anger over the lack of communication from Intermedia, and leapt onto Twitter to complain.
Koen said the network and services were brought online at 3.30pm Eastern Standard Time (EST) and the final server related issues were sorted out by 6.30 EST.
Intermedia will now complete the Reason For Outrage report, take lessons from this to "improve stability and resilience" and increase the "responsiveness and robustness of customer notification tools and systems", said the CEO.
"Although we were successful in notifying many of our customers about the issues via alternate email addresses, text messages and HostPilot, not all customers were reached," he added.
Customers have 72 hours to log the call to qualify for SLA credits, though as our readers pointed yesterday, this does not cover any potential financial losses caused by a dip in productivity or lost business. ®