Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35058

Don't only rely on heartbeat to signal secondary positions in stepdown command

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.7, 4.0.2, 4.1.1
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      Repl 2018-06-04, Repl 2018-06-18, Repl 2018-07-02

      Description

      replSetStepDown command waits for a majority of nodes to catch up and one of them to be an eligible candidate, but such event is only signaled when processing heartbeat responses, which adds more delay to the handoff.

      The easiest and less efficient fix is to signal the condition variable whenever we update the last applied optime. The better solution is to replace the conditional variable with a waiter in _replicationWaiterList as in _awaitReplication_inlock(). A third solution is to call _awaitReplication_inlock(), which might not be desired since the condition stepdown command is waiting on is slightly different than w: majority + an eligible candidate specified in config.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: