Implement invariants to prevent misuse of retry strategies

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Workload Resilience
    • Workload Resilience 2026-02-16
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      There's a core invariant that retry strategies operate with: one retry strategy is only meant to evaluate retry on one particular task. They cannot be reused for multiple requests as this would lead to unplanned internal states. This kind of misuse already happened when customers of this component attempt to implement their own retry loop: SERVER-108330 Use RetryStrategy in WithAutomaticRetry

      This invariant was not implemented in the code, but remains a core assumption of the implementation of retry strategies.

      To prevent this kind of misuse, we should track when a retry strategy is considered done and invariant that recordSuccess, recordBackoff and recordFailureAndEvaluateShouldRetry cannot be called after that point.

      Failure to comply with this invariant could lead to excessive retry, or insufficient retry. Both problem will affect availability. It can also lead to mismatch in our FTDC metrics, which affect our ability to diagnose problems with retry.

      We should implement those invariants to all non wrapping implementation of retry strategies.

            Assignee:
            Unassigned
            Reporter:
            Guillaume Racicot
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: