UPDATE: July 7, 2015 @ 14:30 EDT
We’d like to give an update on yesterday’s issue with Dyn’s Managed DNS service and the steps we’re taking to address it.
The issue involved the availability of name server address records in the dynect.net zone. For background, Dyn serves the zones of most of our Dyn Managed DNS customers on name servers named in the dynect.net zone. That zone contains address records for these name servers, which have names such as “ns1.p01.dynect.net”. The contents of the dynect.net zone is automatically generated from data in an IP address management (IPAM) system. Separately, Dyn uses an automation framework to maintain configurations on all our servers. As a result of an unexpected corner case, the two systems interacted in an unexpected way and many of the name server address records were removed from the dynect.net zone.
Fortunately, most of the address records had a time to live (TTL) value of 24 hours, which mitigated the effect of their temporary removal because these records remained present in many recursive name servers’ caches throughout the Internet. Resolution continued normally for a Dyn customer on those recursive servers where the name server address records persisted in the cache, which we believe was the case in most instances, based on the various monitoring and metrics available to us.
As a result of yesterday’s events, we’re taking the following steps to prevent a recurrence:
- The linkage between the two automated systems has been disabled and will only be restored after we implement additional checks and balances and institute a manual review before critical changes are put in place.
- We’re improving our monitoring to lower the mean time to detection (MTTD) for events like these.
- We’re streamlining our processes to improve customer communication in the future so that our customers are informed more quickly.
We’d again like to thank our partners in the industry for their collaboration during this issue. It is a reminder that there are many people working toward the common good of operating a better Internet and that none of us can do it alone.
An issue such as this one can be a learning experience and a chance to improve. We’re confident this particular scenario will not occur again, and also confident that the changes we are putting in place will improve our response to any future events.
July 6, 2015 @ 20:30 EDT
We’d like to apologize to everyone for the issues earlier today with the Dyn Managed DNS service. Thank you to the many folks who wrote in–not only to report what they were experiencing but also to offer support and kind words as our teams worked to correct the issue.
A special thank you goes to industry friend, CloudFlare, for their collaboration not only with shared customers, but also in general: we all share in the common goal of a faster and more secure Internet.
An automated process that creates the dynect.net zone malfunctioned. That zone contains the addresses of the name servers that our customers delegate to (e.g., ns1.p01.dynect.net), and some of those name server addresses were mistakenly omitted. As soon as we realized the issue, we restored the correct version of that zone and DNS resolution started to return to normal.
We’re still investigating and reviewing logs to get a full picture of what happened, but we’ve disabled the automation and are confident that this issue won’t recur.
During the event, we communicated on our technical status site, DynStatus.com, and pointed customers and partners to the link for constant updates. We encourage all customers to subscribe to DynStatus.com and contact support for specific questions or issues. We’ve long believed in an open view of our current network and product performance where customers can get the latest updates. We’ll have a full post-mortem to follow.
Thank you for your continued support and stay tuned.
About the AuthorFollow on Twitter