Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53431

Server should respond running operations with appropriate topologyVersion on stepdown

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9.0, 4.4.5, 4.2.16
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2
    • Sprint:
      Repl 2021-01-25, Repl 2021-02-08
    • Linked BF Score:
      0

      Description

      1. We kill operations as part of the beginning of stepdown. Calling AutoGetRstlForStepUpStepDown starts the killOp thread
      2. We start to kill user operations before we disabling writes on primary and before transitioning the server to SECONDARY (these are the things that update the server description and trigger a topologyVersion bump)
      3. The killed operation error response is appended with a topologyVersion that hasn't been incremented yet.

      Since the topologyVersion is not incremented, the driver will try to reselect the same server to run the command even though it may still be in the process of stepping down.

      We can consider adding an extra incrementation to the topologyVersion before scheduling the killOps (we already increment the topologyVersion twice as part of stepdown – once for when we disable writes, and another when we complete the transition to secondary). Another alternative is to delaying the killOps logic until the topologyVersion is properly incremented.

        Attachments

          Activity

            People

            Assignee:
            matthew.russotto Matthew Russotto
            Reporter:
            jason.chan Jason Chan
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: