-
Type:
Task
-
Resolution: Done
-
Priority:
Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Go Drivers
-
Not Needed
-
None
-
None
-
None
-
None
-
None
-
None
Context
We have a number of flaky tests that make tasks fail intermittently. The vast majority of the time, the tests pass on retry. We can use the retry_on_failure feature to retry commands exactly once.
From the Evergreen docs:
This is only recommended for commands that are known to be flaky, or fail intermittently. In order to prevent overuse of this feature, the number of times a single task can be automatically restarted on failure is limited to 1 time, and a given project may only automatically restart a maximum of 200 tasks in a given 24-hour period.
Definition of done
- Enable "retry_on_failure" on test run commands.
- Check what happens when a retry happens. Document it here.
Pitfalls
- We may not notice when a test fails intermittently but masks a real bug.
- The retried command may not be logged anywhere, hiding it completely. That's probably not the case, but we need to test it.