Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43847

Make ReplSetTest's stepUp function resilient to slow machines

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Repl 2019-10-21, Repl 2019-11-04, Repl 2019-11-18, Repl 2019-12-02, Repl 2019-12-16, Repl 2019-12-30, Repl 2020-01-13, Repl 2020-01-27, Repl 2020-02-10
    • Linked BF Score:
      15

      Description

      If a node experiences machine slowness while initiating a replica set, then it could cause an unplanned election. If this happens while trying to stepUp a node, specifically after checking that all nodes agree on the applied opTime, then the stepUp could fail because the node stepping up may not have applied the new term oplog entry from the new primary. As a result, the intended primary will fail its election because the new primary is ahead of it and will vote no (this is specific to replica sets that will need the primary's vote to get a majority).

      The stepUp function in ReplSetTest should have logic to retry the step up in such a case.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: