-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Workload Resilience
-
Workload Resilience 2026-02-16
-
None
-
None
-
None
-
None
-
None
-
None
-
None
There's a core invariant that retry strategies operate with: one retry strategy is only meant to evaluate retry on one particular task. They cannot be reused for multiple requests as this would lead to unplanned internal states. This kind of misuse already happened when customers of this component attempt to implement their own retry loop: SERVER-108330 Use RetryStrategy in WithAutomaticRetry
This invariant was not implemented in the code, but remains a core assumption of the implementation of retry strategies.
To prevent this kind of misuse, we should track when a retry strategy is considered done and invariant that recordSuccess, recordBackoff and recordFailureAndEvaluateShouldRetry cannot be called after that point.
Failure to comply with this invariant could lead to excessive retry, or insufficient retry. Both problem will affect availability. It can also lead to mismatch in our FTDC metrics, which affect our ability to diagnose problems with retry.
We should implement those invariants to all non wrapping implementation of retry strategies.
- is related to
-
SERVER-108330 Update WithAutomaticRetry to support RetryStrategy
-
- Closed
-
- related to
-
SERVER-118493 Reconsider uses of shared pointer to retry strategy in async_rpc and AsyncTry
-
- Backlog
-