Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Testing Infrastructure
Labels:
None

Assigned Teams:

Decision Automation Group
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In a patch build we observed a task timeout despite everything seemingly completing successfully. In the task logs we can see first that burn_in_tests decides to run the test one more time:

[2020/06/10 03:51:27.320] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:51:27.320+0000 Requeueing test jstests/core/command_let_variables.js 60 of (2/1000/600.00 min/max/time), cumulative time elapsed 588.15

You can see the previous run took only ~2 seconds (588 - 586):

[2020/06/10 03:51:25.222] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:51:25.222+0000 Requeueing test jstests/core/command_let_variables.js 59 of (2/1000/600.00 min/max/time), cumulative time elapsed 586.06

Then the latest run hits "CleanEveryN" and decides to restart the cluster. This unfortunately takes a very long time:

[2020/06/10 03:55:12.948] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.948+0000 command_let_variables:CleanEveryN ran in 223.59 seconds: no failures detected.
 [2020/06/10 03:55:12.952] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.952+0000 Running job0_fixture_teardown...
 [2020/06/10 03:55:12.990] [executor:multi_stmt_txn_passthrough:job0] 2020-06-10T03:55:12.989+0000 Writing output of job0_fixture_teardown to https://logkeeper.mongodb.org/build/d317c125e63d30f0f698745c7d9acd82/test/5ee059a154f2480b9575c692.
 [2020/06/10 03:56:31.414] Command stopped early: context canceled

I believe that last line is where evergreen decided it was a timeout and runs the hang analyzer. CleanEveryN ran in 223.59 seconds or approximately 4 minutes, which brings it to approximately 3.5 minutes over the budget of 600s that burn_in_tests allocated. The task timeout appears to be 17 minutes, so I suspect that we need to either increase the task timeout or have burn_in_tests abort the last iteration of the test if it exceeds the time budget.

is duplicated by

SERVER-53058 Better account of CleanEveryN runtime when setting dynamic timeouts

Closed

Assignee:: [DO NOT ASSIGN] Backlog - Decision Automation Group (DAG) (Inactive)
Reporter:: Charlie Swanson
Participants:: [DO NOT ASSIGN] Backlog - Decision Automation Group (DAG), Charlie Swanson, David Bradford, Vlad Rachev
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jun 10 2020 06:03:10 PM UTC
Updated:: Dec 06 2022 02:24:36 AM UTC
Resolved:: Mar 29 2021 08:42:42 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates