-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
None
-
Decision Automation Group
-
ALL
In a patch build we observed a task timeout despite everything seemingly completing successfully. In the task logs we can see first that burn_in_tests decides to run the test one more time:
[2020/06/10 03:51:27.320] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:51:27.320+0000 Requeueing test jstests/core/command_let_variables.js 60 of (2/1000/600.00 min/max/time), cumulative time elapsed 588.15
You can see the previous run took only ~2 seconds (588 - 586):
[2020/06/10 03:51:25.222] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:51:25.222+0000 Requeueing test jstests/core/command_let_variables.js 59 of (2/1000/600.00 min/max/time), cumulative time elapsed 586.06
Then the latest run hits "CleanEveryN" and decides to restart the cluster. This unfortunately takes a very long time:
[2020/06/10 03:55:12.948] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.948+0000 command_let_variables:CleanEveryN ran in 223.59 seconds: no failures detected. [2020/06/10 03:55:12.952] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.952+0000 Running job0_fixture_teardown... [2020/06/10 03:55:12.990] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.989+0000 Writing output of job0_fixture_teardown to https://logkeeper.mongodb.org/build/d317c125e63d30f0f698745c7d9acd82/test/5ee059a154f2480b9575c692. [2020/06/10 03:56:31.414] Command stopped early: context canceled
I believe that last line is where evergreen decided it was a timeout and runs the hang analyzer. CleanEveryN ran in 223.59 seconds or approximately 4 minutes, which brings it to approximately 3.5 minutes over the budget of 600s that burn_in_tests allocated. The task timeout appears to be 17 minutes, so I suspect that we need to either increase the task timeout or have burn_in_tests abort the last iteration of the test if it exceeds the time budget.
- is duplicated by
-
SERVER-53058 Better account of CleanEveryN runtime when setting dynamic timeouts
- Closed