Enable "retry_on_failure" on Evergreen test tasks

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Done
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Go Drivers
    • Not Needed
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      We have a number of flaky tests that make tasks fail intermittently. The vast majority of the time, the tests pass on retry. We can use the retry_on_failure feature to retry commands exactly once.

      From the Evergreen docs:

      This is only recommended for commands that are known to be flaky, or fail intermittently. In order to prevent overuse of this feature, the number of times a single task can be automatically restarted on failure is limited to 1 time, and a given project may only automatically restart a maximum of 200 tasks in a given 24-hour period.

      Definition of done

      • Enable "retry_on_failure" on test run commands.
      • Check what happens when a retry happens. Document it here.

      Pitfalls

      • We may not notice when a test fails intermittently but masks a real bug.
      • The retried command may not be logged anywhere, hiding it completely. That's probably not the case, but we need to test it.

              Assignee:
              Matt Dale
              Reporter:
              Matt Dale
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: