Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21018

Prevent priority takeover during an outstanding election

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0-rc1
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Repl B (10/30/15)
    • 0

      In jstests/replsets/tags.js, among the 5 nodes, node 1 has priority 2 and node 2 has priority 3. If the following scenario, lower-priority node 1 will steal the primary from node 2 until node 2 takes over again.

      1. Node 0 becomes the primary in term 1 at the beginning.
      2. Node 2 starts a new election in term 2 because it has a higher priority than Node 0.
      3. Node 1 gets the vote request and votes yes. It updates its term to 2.
      4. Node 1 considers starting a new election because Node 0 is still the legal primary with a lower priority. Node 1 schedules a take-over in several seconds later.
      5. Node 2 gathers enough votes and announces its win.
      6. On Node 1, the scheduled take-over happens and steals the primary.

      If step 4 happens before step 3, everything's fine since term update will cancel priority take-over. If step 4 happens after step 5, it's also fine, because Node 1 won't stand up for election after knowing Node 2 is the new primary. Usually, the window between step 2 and step 4 is several milliseconds, but it's still possible.

      To solve this problem, we could schedule the take-over only if the current primary is in the latest term I know, preventing step 3 from happening. In other words, if the replset is not stable, a node won't try to take over the primary.

            Assignee:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Reporter:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: