O'Reilly Site Reliability Engineering Chapter

Learn all you need to know about email best practices, deliverability, and tools with email whitepapers and ebooks.

Issue link: https://hub.dyn.com/i/961134

Contents of this Issue


Page 7 of 14

mal" fashion. But what does "optimal" mean in this context? There's actually no single answer, because the optimal solution depends heavily on a variety of factors: • The hierarchical level at which we evaluate the problem (global versus local) • The technical level at which we evaluate the problem (hardware versus software) • The nature of the traffic we're dealing with Let's start by reviewing two common traffic scenarios: a basic search request and a video upload request. Users want to get their query results quickly, so the most important variable for the search request is latency. On the other hand, users expect video uploads to take a non-negligible amount of time, but also want such requests to succeed the first time, so the most important variable for the video upload is through‐ put. The differing needs of the two requests play a role in how we determine the opti‐ mal distribution for each request at the global level: • The search request is sent to the nearest available datacenter—as measured in round-trip time (RTT)—because we want to minimize the latency on the request. • The video upload stream is routed via a different path—perhaps to a link that is currently underutilized—to maximize the throughput at the expense of latency. But on the local level, inside a given datacenter, we often assume that all machines within the building are equally distant to the user and connected to the same net‐ work. Therefore, optimal distribution of load focuses on optimal resource utilization and protecting a single server from overloading. Of course, this example presents a vastly simplified picture. In reality, many more considerations factor into optimal load distribution: some requests may be directed to a datacenter that is slightly farther away in order to keep caches warm, or non- interactive traffic may be routed to a completely different region to avoid network congestion. Load balancing, especially for large systems, is anything but straightfor‐ ward and static. At Google, we've approached the problem by load balancing at multi‐ ple levels, two of which are described in the following sections. For the sake of presenting a concrete discussion, we'll consider HTTP requests sent over TCP. Load balancing of stateless services (like DNS over UDP) differs slightly, but most of the mechanisms described here should be applicable to stateless services as well. Load Balancing Using DNS Before a client can even send an HTTP request, it often has to look up an IP address using DNS. This provides the perfect opportunity to introduce our first layer of load balancing: DNS load balancing. The simplest solution is to return multiple A or AAAA records in the DNS reply and let the client pick an IP address arbitrarily. While con‐ ceptually simple and trivial to implement, this solution poses multiple challenges. 2 | Chapter 19 : Load Balancing at the Frontend

Articles in this issue

view archives of eBooks - O'Reilly Site Reliability Engineering Chapter