-
Type:
Spec Change
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Component/s: Retryability
-
None
-
Needed
-
Introduce an adaptive retry policy (based on token buckets) to limit load amplification during peak overload.
Clients will support an adaptive retry policy which aims to overcome metastable failures during overload. Each time a client makes a successful request (ok:1 or a successful error like DuplicateKeyError) it deposits a fractional “token” into a bucket. Each time a request fails (ok:0 with SystemOverloaded error), the client performs retries as normal (with exponential backoff + jitter) as long as there are whole tokens available in the bucket. This approach establishes a limited memory for the operational conditions of the upstream service: if there are tokens available for retry, then the service has been healthy recently.
- related to
-
DRIVERS-3246 Make system overload errors easier to diagnose
-
- Backlog
-
-
SERVER-108583 Add diagnostic metadata to identify retried commands
-
- Needs Scheduling
-
-
DRIVERS-3241 Add diagnostic metadata to retried commands
-
- Backlog
-