Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42602

Guarantee that unconditional step down will not happen due to slow node restarts in rollback_fuzzer_[un]clean_shutdowns suites.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.1, 4.3.1
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v4.2
    • Repl 2019-08-26, Repl 2019-09-09
    • 19

      There are 2 kinds of phases in rollback fuzzer test suites.

      1. State Transition Phase - RollbackTest transitions to predefined state before the rollback fuzzer gets into workload execution phase.
      2. Workload Execution Phase - rollback fuzzer executes some list of random commands (including restartNode cmd which can result in change of primary) on the replica set.

      After the RollbackTest transitions to "transitionToSyncSourceOperationsDuringRollback" state, we break the assumption mentioned here in rollback fuzzer. After the "transitionToSyncSourceOperationsDuringRollback" state, the topology looks like below.

      [CurSecondary/Node to be rolled back]
              |
              |
              |
      [CurPrimary]-------- [TieBreakerNode]

      Once the curSecondary node gets rolled back successfully (i.e) caught up to curPrimary, restarting a curPrimary can result in curSecondary to become the new primary. As a result, during workload execution phase, unconditional step down can happen due to slow planned node restarts (i.e. node restarts taking long time). And, that  leads to undesired behavior in  rollback_fuzzer_[un]clean_shutdown suites. So, in order to fix the issue, we should have below 2 contracts.

      1) During workload execution phase, unconditional step down can happen only due to some transient network issues and not because of slow planned node restarts (i.e. node restarts taking long time).

      2) Restarting nodes by rollback fuzzer can change the original primary only if all the 3 nodes are connected. 

            Assignee:
            suganthi.mani@mongodb.com Suganthi Mani
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: