-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Component/s: Retryability
-
None
-
Not Needed
Investigate benefits of server controlled "retryAfter" backoff mechanism. The idea is that under a prolonged overload, the client will still retry multiple times. The server can reduce the total number of failed retry attempts by introducing a "retryAfter" flag in overload responses.
Some comments:
- We already reduce the risk of the retry amplification problem by introducing exponential backoff, jitter, and retry token bucket. Reducing retry attempts even further may be a useful improvement but is always one that can come later.
- The server will have to be able to estimate the overload to be able to accurately inform the client how long to backoff.
- When one server tells the client to backoff via "retryAfter" but another server is healthy and available now, should the client still wait? If yes then we're potentially increasing latency in this scenario. If no, then it may make server selection for retries more complicated.