Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43904

When stepping down, step up doesn't filter out frozen nodes

    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0, v3.6
    • Repl 2020-10-19

      One of the recommended ways [0] to force a particular node to become primary is to freeze all non-candidate nodes and then call replSetStepDown on the primary. As of MongoDB 3.6, that code attempts to step up a candidate (by calling replSetStepUp). However, that code doesn't exclude frozen nodes, and attempting to step up a frozen node will simply fail ("2019-10-09T00:24:05.517+0000 I REPL [conn352334] Not starting an election for a replSetStepUp request, since we are not electable due to: Not standing for election because I am still waiting for stepdown period to end at 2019-10-09T00:33:59.473+0000 (mask 0x20)"). This isn't particularly bad, since the unfrozen node will actually call for, and win, an election, but it does make failovers slower (up to electionTimeoutMillis slower, presumably).

      An alternative approach that we're using, that isn't explicitly documented, is to increase the priority of both the current and candidate node, and then run replSetStepDown. I've verified both in code and logs that this is effective at getting mongo to step up the candidate node consistently. It might be nice to document this approach, since I think it offers improvements over both approaches currently mentioned. Increasing the priority on just the candidate works, but tends to be slower since the "priority takeover" mechanism takes a few seconds to trigger, and provides less control than an explicit replSetStepDown.

      [0] https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/

            Assignee:
            xuerui.fa@mongodb.com Xuerui Fa
            Reporter:
            bartle David Bartley
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: