[SERVER-46842] resmoke.py shouldn't run data consistency checks in stepdown suites if a process has crashed Created: 13/Mar/20 Updated: 29/Oct/23 Resolved: 22/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.1, 4.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Mikhail Shchatko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp, tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Backport Requested: |
v4.4
|
||||
| Sprint: | STM 2020-03-23, STM 2020-04-20, STM 2020-05-04 | ||||
| Participants: | |||||
| Story Points: | 1 | ||||
| Description |
|
resmoke.py ordinarily checks that a test didn't cause the server to crash by calling self.fixture.is_running() after the test finishes. However, due to the stepdown thread and the job thread only being synchronized by calling ContinuousStepdown.after_test(), it isn't safe to check whether the fixture is still running immediately after the test finishes.
Skipping this check causes resmoke.py to continue to run the other data consistency checks, even when a process in the MongoDB cluster has crashed. While misleading for Server engineers in terms of causing them to click on the "wrong" link in Evergreen for the task failure, it also have a severe negative impact on our automated log extraction tool by preventing it from finding relevant information. We should ensure process crashes in test suites using the ContinuousStepdown hook prevent other tests and hooks from running. I suspect having _StepdownThread.pause() check that fixture is still running as the last thing it does would accomplish this. |
| Comments |
| Comment by Githook User [ 04/Jun/20 ] | ||
|
Author: {'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com'}Message: (cherry picked from commit 40801001754b6bdc15bd2f59eae523c59b6ff055) | ||
| Comment by Siyuan Zhou [ 04/Jun/20 ] | ||
|
Awesome. Thank you! | ||
| Comment by Robert Guo (Inactive) [ 04/Jun/20 ] | ||
|
siyuan.zhou Done! backport is in the commit queue | ||
| Comment by Siyuan Zhou [ 29/May/20 ] | ||
|
mikhail.shchatko and robert.guo, do you have plan to backport this to 4.4? I found the test change in my pacth of | ||
| Comment by Githook User [ 28/Apr/20 ] | ||
|
The following changes were intended for Message: (cherry picked from commit ef75364ada70eaf4a096ed07adfeb3175abd719b) | ||
| Comment by Githook User [ 22/Apr/20 ] | ||
|
Author: {'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com'}Message: | ||
| Comment by Ian Whalen (Inactive) [ 13/Mar/20 ] | ||
|
BFG-555889 had been run through the bot analyzer already and had extracted some unuseful logs from the CheckReplDBHash failure, but missed the following from the test logs:
| ||
| Comment by Max Hirschhorn [ 13/Mar/20 ] | ||
|
Ian was the one doing the screen share so I'd want to double check with him / his browser history on the specific ones we were looking through. | ||
| Comment by David Bradford (Inactive) [ 13/Mar/20 ] | ||
|
For the cases where the description was empty, do you know if the had the bot-analyzed label. Due to the volume of BFGs coming in lately the log analysis worker has been thousands of BFGs behind all this week. It didn't get caught up until today. |