[SERVER-35529] Rollback fuzzer suites do not need to shutdown nodes in states where rollbacks do not occur Created: 11/Jun/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | gm-ack, rollback-fuzzer, stm | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Server Tooling & Methods
|
||||
| Participants: | |||||
| Linked BF Score: | 16 | ||||
| Description |
|
The rollback_fuzzer_unclean_shutdowns/rollback_fuzzer_clean_shutdowns trigger rollbacks via the RollbackTest fixture, but inject clean and unclean node restarts at random points within the test. The RollbackTest fixture models a test execution as a state machine. At each state, we expect each node and the overall replica set topology to be in a particular state, which should (for the most part) not change until we execute the next state transition. There are five states of the RollbackTest:
The goal of running the RollbackFuzzer with node shutdowns was mainly to exercise (1) shutting down a node while it is in the process of rollback or recovery and (2) shutting down a node that is currently being used as a sync source for a node undergoing rollback. These goals can still be achieved without injecting node restarts into states kRollbackOps or kSyncSourceOpsBeforeRollback. The only states we really need to be injecting restarts into are kSyncSourceOpsDuringRollback, since this is the state where rollback may be occurring. Removing unnecessary restarts will, ideally, reduce test times, and make debugging easier, since the logs will no longer include the many extra state transitions introduced by replica set node restarts. An easy way to implement this would be to make the restartNode function nilpotent when in certain states in RollbackTest. Alternatively, we could update the fuzzer to only add restart commands at certain points in the test. |
| Comments |
| Comment by Steven Vannelli [ 10/May/22 ] |
|
Moving this ticket to the Backlog and removing the "Backlog" fixVersion as per our latest policy for using fixVersions. |
| Comment by Ian Whalen (Inactive) [ 09/Jan/20 ] |
|
Possibly of value, but not worth picking up until we have completed some prior improvements in the rollback fuzzer (coverage-guided fuzzer). |
| Comment by Gregory McKeon (Inactive) [ 12/Jun/18 ] |
|
Send this to TIG to see if they think it's worth doing. |
| Comment by William Schultz (Inactive) [ 11/Jun/18 ] |
|
I also believe that this change will help mitigate the symptoms in BF-9145. |