MS blames lowly techie for Web blackout
Takes 22 hours to fix router config error
Microsoft has blamed a lowly technician for a cock-up which almost completely blocked access to its Web sites for most users yesterday.
From the early hours of yesterday morning until late evening www.microsoft.com, msn.com, expedia.co.uk and msnbc.com were all unavailable. The software giant's Hotmail service was also inaccessible for many.
The problem, whose final resolution came some six hours after Microsoft promised a fix would be in place yesterday, was due to changes in Microsoft's domain name server network caused requests to access its Web sites to fail. A fix was eventually put in place when Microsoft removed the changes made to the configuration that were behind the problem.
In a statement, Microsoft admitted: "At 6:30 p.m. Tuesday (PST), a Microsoft technician made a configuration change to the routers on the edge of Microsoft's Domain Name Server network. The DNS servers are used to connect domain names with numeric IP addresses (eg. 126.96.36.199) of the various servers and networks that make up Microsoft's Web presence.
"The mistaken configuration change limited communication between DNS servers on the Internet and Microsoft's DNS servers. This limited communication caused many of Microsoft's sites to be unreachable (although they were actually still operational) to a large number of customers."
Microsoft has apologised to customers for the problem, which it denies is down to either a technology failure with its or anybody else's products nor down to lax security of its networks. This leaves as lowly techie, who we'd be most interesting in talking to, carrying the can for the whole sorry debacle, which hardly inspires much confidence in Microsoft's .Net vision for delivering software as a service over the Internet.
As Register readers pointed out, all four of Microsoft's domain name servers appear to be located on the same subnet - effectively putting all its eggs in one basket.
Russ Cooper, editor of security mailing list NTBugtraq, has laid into Microsoft's explanation and said "stupid people do make mistakes, but for a company the size of MS there's really no excuse for such a blunder."
Cooper said: "It seems the corporation [Microsoft] is not sufficiently aware of the importance of DNS, despite its role in both .NET. You think you might have a disaster recovery plan that gets invoked within an hour of confirmation that DNS is out... one that includes checking the router configuration for your DNS 'network'."
He added that the debacle showed that there was no vetting or management sign-off to changes in production networks at Redmond. ®