We’ve been writing a series of posts about Building Applications for Resiliency. In the last post of the series, we talked about Circuit Breakers, retrying and timeouts. Using those patterns, we’re already on our way producing reliable systems.
Breaking systems is still easy enough though. Therefore, today, we want to discuss the next brick in the wall. Exponential Back-off Algorithms.
Fear Not, Be Slow
Out of the blue, it is facing the real world. This parallel dimension, called Production. And in here, hell broke loose a long time ago.
To mitigate the storm of requests, the people of Production-Land started to use a weird technique. They introduce an explicit slowdown, but why?
Coming back to our reality as developers, slowing something down never makes sense, except for building a loading screen! Anyway, slowing down a service doesn’t really seem meaningful. If there are a large number of requests, we want to get faster, not slower.
Fear not though. The question is only which side of the requests we’re going to slow down.
Freaking Out As A Service
In Germany, we have a saying that claims that everything has just one end, except for sausages, which has two ends.
We know, however, that our service request has two ends too. A sender (source) and a receiver (target). The request is being sent from the source to the receiver, which does its magic and responds.
If we overload the receiving end though, requests get dropped, latency goes up, many different things start to happen. Most of those things have one specific thing in common, we don’t like their result. Often in many ways.
Imagine the website you want to visit doesn’t work. The Response time is too high, or images are just not loading.
What is the first line of defense? You hit F5 (or Control+R if you’re on a Mac). Now increase the range of this thought experiment, out of your computer and into a server application, that starts to retry. Imagine the poor server on the other end that may already be massively overloaded when the retry comes in. The result? Even more load.
Slowdown To The Rescue
For real users in front of the screen, the easiest way to resolve this situation is using the “timeouts” we discussed last time.
Remember, tell the user to retry “later”, not just retry or you create the exact same situation we had in our thought experiment.
That said, by telling the user to come back later we already built our first Back-off Algorithm. And with that, we (hopefully) slowed down the producer side, namely our user.
The same system or scheme can be used with services talking to each other.
Service A wants to talk to service B but it is already overloaded, instead of returning an error or queuing up requests (which leads to a major latency spike), service B responds with a Throttling Response to come back later.
A Throttling Response is basically a request to slow down and therefore the same as asking the user to come back later. Normally it includes a time to wait before retrying. What effectively happens, we’re slowing down the producer.
If you ever built a REST API client, you most probably had to deal with those kinds of slowdown requests. They’re basically the same as rate limits which act as a simple back-off and precaution to prevent overloading a system in the first place.
The IETF (Internet Engineering Task Force) actually works on specifying and generalizing headers for rate limiting and throttling.
The Exponential Growth
Building on what we know about back-off algorithms, we can now come to the interesting bit. Let’s go back to our overloaded server. Sending throttling responses might solve the situation, except maybe for long-running operations.
That said, it normally is combined with some exponential formula, calculating the back-off time. The more retries a calling service has, the more the delay between requests can grow.
Unfortunately, nobody seems to agree on the perfect formula to make that happen. There are two formulas with significant usage since they’re used by Google and AWS.
AWS: random(0, 2^min(n, 8)) 
In both implementations n is the number of retries and increasing the number increases the delay between two consecutive requests.
The idea for the exponential approach is simple. The longer the high load, the issue or maybe failure is, the less will service A try to send its request. That results in even less load and a better chance for service B to come back to a healthy state.
Btw, this can be implemented on both ends. That means, if service B doesn’t implement an exponential back-off algorithm or, in the worst case, no back-off strategy at all, it is perfectly fine to implement it in the client. Especially since it decreases the error rate, the traffic and maybe log file size.
Living On The Edge
With all that said, there is no convenient answer to the question of how to implement an exponential back-off algorithm in your own system. Especially since it heavily depends on your use case. An exponential back-off with a maximum of 64 seconds between retries doesn’t make sense if the transferred value is outdated after 10 seconds.
In general cases, I’d recommend staying with known and battle-tested algorithms as long as they make sense for your system. Those two above are not the only ones, look around similar systems and there’s a high chance you find a formula that is a perfect match.
Monitor Your Systems
Using Instana to analyze the usefulness of your exponential back-off algorithm though, is super meaningful and important.
You are not only able to capture all the requests and responses, but you’ll also see how the behavior changes with or without the back-off. This is super important during load tests, but even better in production!
If you don’t have an Instana license yet, sign up for the 14-days free trial and get automatic monitoring and performance insight into your services and applications in less than 5 minutes.