Updated The domain name server "OpenSRS" crashed and burned during the dead of night following a network failure, taking down scores of customers' portals with it. Normal service has yet to resume.
OpenSRS is owned by the operating service Tucows, and according to builtwith.com, there are about 690,000 active websites using it.
Meanwhile, domain name manager Hover – another brand owned by Tucows and underpinned by wholesale services provided by OpenSRS – has also hit the mat.
At 0144 UTC (0244 BST) this morning, the OpenSRS status page said it had found unspecified "network connectivity issues" causing "degraded performance" to loading control panels and webmail.
A couple of hours later at 0352 UTC (0452 BST), the page also indicated DNS services and its API were affected.
Hover customer and Reg reader Jeff told us: “We have about 15 domains hosted there and we started seeing alerts from our monitoring service at 03.45 this morning. The only thing we have to go on is the twitter status. Hover's own website was unavailable last time I looked and we've had no information from them.”
We have contacted Hover for comment and will update this article if we hear back from them. It appears that the outage ended late this morning London time, though we have yet to hear from Hover itself.
Hover’s own status update page had this to say at the time of writing:
[Investigating] It has come to our attention that some search results for new domains are not providing results. This is resulting in a "Oops! There was a problem finding suggestions for you." message We are currently investigating the root cause of this issue. Thank you for your patience.
[Identified] Our operations team continues to work on resolving the error encountered when searching for new domains on the Hover.com website. Some customers have also experienced a brief intermittent issue with DNS settings not resolving. This is related to a network issue that our team is currently fixing. We appreciate your patience while we get this resolved and will continue to update you here as more information becomes available.
[Identified] Our operations team continues to work on fixing various issues being encountered at this time due to a network issue. Currently customers may experience issues with the following: -Access to Hover.com -Access to the Hover Help Center -DNS resolution -Access to Hover webmail portal We are working diligently to fix this issue and thank you for your patience while we resolve this issue.
On the bright side, Hover's webmail portal, which had been KO’d yesterday, is now back online for POP and IMAP users – so Hover says.
OpenSRS and Hover users have shared their distress on Twitter.
@OpenSRS Massive probs with DNS and hosted email services. Bad communication. Must have effected hundreds of thousands of customers.— Henning Geiler (@henninggeiler) September 29, 2017
wow. try explaining this to a client: a failure at DNS level of domain registrar Tucows/@OpenSRS has left all sites down and out.— admataz (@admataz) September 29, 2017
@HoverStatus I'd like to know two things, in reverse order:— Hans van Dijk (@hansvandijk1603) September 29, 2017
2. What went wrong?
1. When is it expected to be fixed?
A spokesperson for Tucows has not responded to a request for comment. A support agent reached at OpenSRS's North America phone line referred The Register to the status page for additional updates.
The latest update at 1055 UTC (1155 BST) indicated that "we are now seeing significant improvement to the intermittent DNS issue impacting all OpenSRS services, however services are not fully restored yet."
There is no additional information on the cause of the outage or an ETA for a fix yet but OpenSRS thanked customers for their patience... so that's OK then. ®
Updated to add
OpenSRS has said it's back online. "At 1AM UTC we were the target of a sophisticated DNS attack that was followed by an unrelated double failure of core network equipment at our main Canadian data center, caused by an undocumented software limitation," explained executive veep Dave Woroch.
"We were able to quickly recover from the equipment failure but continued to experience the DNS attack until 13:10 UTC, when the attack was stopped and systems started responding reliably again. The network equipment failure made it more difficult for us to identify that we were under a DNS attack and impacted our response time."
Sponsored: Webcast: Ransomware has gone nuclear