Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8483

reconfig may cause problem re-electing primary



    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
    • Operating System:


      Setup is this:

      Replica set with 4 nodes, priority 0 except for the first node A (only the first node can be primary).

      Nodes B and C slaveDelayed by 0 or 40s, alternating via reconfigs.

      Node D blackholed from node A, symmetrically (A can't talk to D, D can't talk to A).

      At first node D correctly switches sync'ing between nodes A and B, depending on which is delayed. Each time the reconfig happens node A drops to secondary, then is elected primary.

      At some point though it seems impossible for node A to become the primary again after a reconfig. There is a strange message in the logs of node A:

       m31000| Thu Jan 17 17:05:00.147 [rsMgr] not electing self, would veto with ' is trying to elect itself but is already primary and more up-to-date'

      Test to reproduce and output from two runs is attached below (with replSetStatus from all nodes every 5s during the problem period).


        1. currentTest_failure_same_host_veto.txt
          177 kB
        2. currentTest.txt
          569 kB
        3. sync_change_source.js
          3 kB



            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: