[SERVER-35529] Rollback fuzzer suites do not need to shutdown nodes in states where rollbacks do not occur Created: 11/Jun/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: William Schultz (Inactive) Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Unresolved Votes: 0
Labels: gm-ack, rollback-fuzzer, stm
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Server Tooling & Methods
Participants:
Linked BF Score: 16

 Description   

The rollback_fuzzer_unclean_shutdowns/rollback_fuzzer_clean_shutdowns trigger rollbacks via the RollbackTest fixture, but inject clean and unclean node restarts at random points within the test. The RollbackTest fixture models a test execution as a state machine. At each state, we expect each node and the overall replica set topology to be in a particular state, which should (for the most part) not change until we execute the next state transition. There are five states of the RollbackTest:

  • kStopped - test is no longer running
  • kRollbackOps - Old primary is isolated from a secondary. Writes done to it will be rolled back
  • kSyncSourceOpsBeforeRollback - New primary has been elected, can take writes.
  • kSyncSourceOpsDuringRollback - Rollback on old primary should be in progress.
  • kSteadyStateOps - Rollback should have completed and replica set should now be in steady state.

The goal of running the RollbackFuzzer with node shutdowns was mainly to exercise (1) shutting down a node while it is in the process of rollback or recovery and (2) shutting down a node that is currently being used as a sync source for a node undergoing rollback. These goals can still be achieved without injecting node restarts into states kRollbackOps or kSyncSourceOpsBeforeRollback. The only states we really need to be injecting restarts into are kSyncSourceOpsDuringRollback, since this is the state where rollback may be occurring.

Removing unnecessary restarts will, ideally, reduce test times, and make debugging easier, since the logs will no longer include the many extra state transitions introduced by replica set node restarts.

An easy way to implement this would be to make the restartNode function nilpotent when in certain states in RollbackTest. Alternatively, we could update the fuzzer to only add restart commands at certain points in the test.



 Comments   
Comment by Steven Vannelli [ 10/May/22 ]

Moving this ticket to the Backlog and removing the "Backlog" fixVersion as per our latest policy for using fixVersions.

Comment by Ian Whalen (Inactive) [ 09/Jan/20 ]

Possibly of value, but not worth picking up until we have completed some prior improvements in the rollback fuzzer (coverage-guided fuzzer).

Comment by Gregory McKeon (Inactive) [ 12/Jun/18 ]

Send this to TIG to see if they think it's worth doing.

Comment by William Schultz (Inactive) [ 11/Jun/18 ]

I also believe that this change will help mitigate the symptoms in BF-9145.

Generated at Thu Feb 08 04:40:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.