Abort election if freshness check cannot ensure majority of voters

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Done
    • Priority: Major - P3
    • 2.7.8
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Minor Change
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Add an additional requirement to freshness checking to ensure that a majority of voters can vote in the election before starting. This additional check will result in a fail-fast path when a majority won't be voting, and might reduce election times when a majority is later available.

      Longer Explanation
      This issue is similar to SERVER-14382, in that you don't want the election protocol to call CmdReplSetElect if you have any reason to believe that the election will not be successful. Calling CmdReplSetElect and failing is bad because members that do vote "yes" will be barred from voting for 30 seconds.

      A candidate determines that a majority of the replica set is up by looking at the state of member heartbeats, via Consensus::aMajoritySeemsToBeUp(). If a network was just partitioned, this information may be inaccurate, because the latest heartbeats may not have failed yet. So, when a candidate calls CmdReplSetFresh to guage whether it should run an election, it should count the number of responses it gets to ensure a majority is truly up. With the code now, less than a majority may respond saying "go ahead", and the election protocol still proceeds to call CmdReplSetElect.

            Assignee:
            Scott Hernandez (Inactive)
            Reporter:
            Zardosht Kasheff
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: