O'Reilly Site Reliability Engineering Chapter

Learn all you need to know about email best practices, deliverability, and tools with email whitepapers and ebooks.

Issue link: https://hub.dyn.com/i/961134

Contents of this Issue


Page 9 of 14

2 Sadly, not all DNS resolvers respect the TTL value set by authoritative nameservers. 3 Otherwise, users must establish a TCP connection just to get a list of IP addresses. Finally, recursive resolvers typically cache responses and forward those responses within limits indicated by the time-to-live (TTL) field in the DNS record. The end result is that estimating the impact of a given reply is difficult: a single authoritative reply may reach a single user or multiple thousands of users. We solve this problem in two ways: • We analyze traffic changes and continuously update our list of known DNS resolvers with the approximate size of the user base behind a given resolver, which allows us to track the potential impact of any given resolver. • We estimate the geographical distribution of the users behind each tracked resolver to increase the chance that we direct those users to the best location. Estimating geographic distribution is particularly tricky if the user base is distributed across large regions. In such cases, we make trade-offs to select the best location and optimize the experience for the majority of users. But what does "best location" really mean in the context of DNS load balancing? The most obvious answer is the location closest to the user. However (as if determining users' locations isn't difficult in and of itself ), there are additional criteria. The DNS load balancer needs to make sure that the datacenter it selects has enough capacity to serve requests from users that are likely to receive its reply. It also needs to know that the selected datacenter and its network connectivity are in good shape, because directing user requests to a datacenter that's experiencing power or networking prob‐ lems isn't ideal. Fortunately, we can integrate the authoritative DNS server with our global control systems that track traffic, capacity, and the state of our infrastructure. The third implication of the DNS middleman is related to caching. Given that author‐ itative nameservers cannot flush resolvers' caches, DNS records need a relatively low TTL. This effectively sets a lower bound on how quickly DNS changes can be propa‐ gated to users. 2 Unfortunately, there is little we can do other than to keep this in mind as we make load balancing decisions. Despite all of these problems, DNS is still the simplest and most effective way to bal‐ ance load before the user's connection even starts. On the other hand, it should be clear that load balancing with DNS on its own is not sufficient. Keep in mind that all DNS replies served should fit within the 512-byte limit 3 set by RFC 1035. This limit sets an upper bound on the number of addresses we can squeeze into a single DNS reply, and that number is almost certainly less than our number of servers. 4 | Chapter 19 : Load Balancing at the Frontend

Articles in this issue

view archives of eBooks - O'Reilly Site Reliability Engineering Chapter