[SERVER-49416] rollback_fuzzer_clean_shutdowns suite may perform unclean shutdowns Created: 10/Jul/20 Updated: 29/Oct/23 Resolved: 24/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2020-08-10 | ||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||
| Description |
|
The rollback_fuzzer_clean_shutdowns suite is supposed to only shut down nodes cleanly i.e. with a SIGTERM signal. This is ensured in the rollback test fixture by converting any signal to a SIGTERM if the TestData.allowUncleanShutdowns parameter is not set, which is the default. However, in stopMongoProgram we set the allowUncleanShutdown setting to true if the given signal is not equal to MongoRunner.EXIT_CLEAN. So, after we do an initial clean shutdown in a generated fuzzer test, subsequent shutdowns are allowed to be unclean i.e. use SIGKILL. |
| Comments |
| Comment by Githook User [ 24/Jul/20 ] |
|
Author: {'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}Message: |
| Comment by William Schultz (Inactive) [ 24/Jul/20 ] |
|
In a patch build that includes the fix for this bug, a number of validation errors appeared in tests due to fast count mismatches. I expect these were masked by the existing code since we skipped fast count validation whenever a test did any shutdown, regardless of whether it was clean or unclean. There are a number of failures in noPassthrough related to backup restore, and a couple in replica_sets related to prepared transactions. |
| Comment by William Schultz (Inactive) [ 10/Jul/20 ] |
Agreed. I think we just need to change MongoRunner.EXIT_CLEAN to SIGTERM in that condition. I expect that's what the original ticket was trying to achieve, but mixed up the signal and exit codes. |
| Comment by Max Hirschhorn [ 10/Jul/20 ] |
|
I believe the confusion is MongoRunner.EXIT_CLEAN is an exit code and SIGTERM is a signal for shutting down. A mongod which receives a SIGTERM and cleanly shuts down should exit with code 0 (= MongoRunner.EXIT_CLEAN). |
| Comment by Gregory Noma [ 10/Jul/20 ] |
|
william.schultz The itent of that condition was so that we don't enforce fast counts during validation if there was an unclean shutdown during the test (here and here), since in that case we expect fast counts to be inaccurate. Perhaps in that case, we should set TestData.skipEnforceFastCountOnValidate instead of TestData.allowUncleanShutdowns? When do we expect SIGTERM (15) versus EXIT_CLEAN (0)? |
| Comment by William Schultz (Inactive) [ 10/Jul/20 ] |
|
Note that this isn't a critical problem as far as test coverage goes, since we are still getting coverage of both clean and unclean shutdowns. It just makes debugging more difficult if your assumption is that only clean shutdowns are occurring in this suite. |
| Comment by William Schultz (Inactive) [ 10/Jul/20 ] |
|
We can see this occurring in the logs of any recent rollback_fuzzer_clean_shutdowns Evergreen run as well. A regex search for Stopping node.*with signal 9 turns up the issue. |
| Comment by William Schultz (Inactive) [ 10/Jul/20 ] |
|
The problematic behavior in stopMongoProgram was introduced in this commit. It seems that this condition may be checking for the wrong signal code, since MongoRunner.EXIT_CLEAN is 0. Perhaps the correct condition may be to check if signal !== 15 i.e. check for SIGTERM shutdown signal. I'm not sure exactly what the original intention of that logic was, though. gregory.noma |