Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58619

Continuous Stepdown's replSetStepDown Is Not Resilient To External Elections

    • Cluster Scalability
    • ALL
    • 0

      TheĀ  _continuousPrimaryStepdownFn function is part of a background thread that will continually stepdown the primary. It keeps a reference of the latest primary in memory and updates each time after stepping down the old primary.

      This setup means that if there was an election after it decided to update what it thinks the primary is, it will have an old reference of what the primary is. Hence, next time it attempts to stepdown the primary it will have a network error.

      In order to solve this we should wrap the command execution below in a trycatch. If its a network then swallow the exception, and otherwise rethrow the error. To be handled by the higher up trycatch.

      When swallowing the exception make sure to print out its ocurrence.

                      assert.commandWorkedOrFailedWithCode(
                          primary.adminCommand(
                              {replSetStepDown: options.stepdownDurationSecs, force: true}),
                          [ErrorCodes.NotWritablePrimary, ErrorCodes.ConflictingOperationInProgress]);
      

            Assignee:
            backlog-server-cluster-scalability [DO NOT USE] Backlog - Cluster Scalability
            Reporter:
            luis.osta@mongodb.com Luis Osta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: