Reduce number of custom retry mechanisms in tests

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Execution
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None

      In jstests running in the mongo shell, we often need retry mechanisms that retry a command/sequence of commands upon certain error conditions, e.g. transient network errors, stale configurations etc.

      We currently have a generic assert.retry function in assert.js to achieve this.
      There is also assert.retryNoExcept.

      In addition to that, we have several other functions/code that handle retries:

      $ ls jstests/libs/*retr*
      jstests/libs/auto_retry_transaction_in_sharding.js  jstests/libs/retryable_mongo.js        jstests/libs/run_with_retries.js
      jstests/libs/command_sequence_with_retries.js       jstests/libs/retryable_writes_util.js
      

      There are also several custom implementations and usages of those in our tests:

      grep -r -i -P "(runWithRetries|withRetr)" jstests/ | wc -l
      410
      

      Several of these custom retry functions could be rewritten to use assert.retry(...) instead with slightly adjusted callback functions.

      Going with fewer retry implementations could help us reduce test codebase size. It would also help to avoid reinventing the wheel the next time a function with retries is needed.

      The current assert.retry function is already quite powerful, but to handle most of the use cases currently covered by custom retry implementations, it would need to be generalized and offer the following capabilities:

      • return the result of the (retried) callback function
      • whether or not to use exponential backoff for waiting, or provide a function to calculate the next wait time
      • expressing retries as the maximum amount of time to retry instead of maximum number of retry attempts
      • add an easy way to determine if a retry is warranted upon error, e.g. by providing a list of error codes upon which to retry, and fail otherwise

      There are potentially a fewer other features missing for a generic retry implementation that covers 95% of current use cases.

            Assignee:
            Unassigned
            Reporter:
            Jan Steemann
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: