Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30727

Secondary stuck in loop of current election

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major - P3 Major - P3
    • None
    • 3.4.3
    • Replication
    • None
    • ALL

    Description

      Hi,

      This was witnessed twice where the events happened in the following order.
      1. Primary(A) is down.
      2. Secondary(B) detects and starts an election.
      3. It wins the dry-run election and on every subsequent election timeout(10s) prints the following. This goes on until its restarted.

      [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election again; already candidate
      

      4. A is started. It performs the election and is primary.
      5. B is restarted. On one instance it took a long time to shutdown and in another had to be SIGKILLed.
      6. A is shutdown.
      7. B detects, starts elections and becomes primary.

      Relevant logs from Node A for step 1-4

      2017-08-17T05:32:08.534-0700 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
      2017-08-17T05:32:08.534-0700 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
      2017-08-17T05:32:08.534-0700 I REPL     [ReplicationExecutor] VoteRequester(term 8 dry run) received a yes vote from nodeB:27017; response message: { term: 8, voteGranted: true, reason: "", ok: 1.0 }
      2017-08-17T05:32:08.534-0700 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
      2017-08-17T05:32:08.621-0700 I REPL     [ReplicationExecutor] VoteRequester(term 9) received a yes vote from node-arbiter:27022; response message: { term: 9, voteGranted: true, reason: "", ok: 1.0 }
      2017-08-17T05:32:08.621-0700 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 9
      

      Relevant logs from Node B for step 1-4

      2017-08-17T05:18:32.066-0700 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
      2017-08-17T05:18:32.066-0700 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
      2017-08-17T05:18:32.068-0700 I REPL     [ReplicationExecutor] VoteRequester(term 7 dry run) failed to receive response from nodeA:27017: HostUnreachable: Connection refused
      2017-08-17T05:18:32.069-0700 I REPL     [ReplicationExecutor] VoteRequester(term 7 dry run) received a yes vote from node-Arbiter:27022; response message: { term: 7, voteGranted: true, reason: "", ok: 1.0 }
      2017-08-17T05:18:32.069-0700 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
      2017-08-17T05:18:42.441-0700 I REPL     [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election again; already candidate
      2017-08-17T05:18:53.473-0700 I REPL     [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election again; already candidate
      2017-08-17T05:19:04.837-0700 I REPL     [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election again; already candidate
      ...
      

      Note that, although nodeB didn't log any voteRequester messages, it does increase the election term from 7 to 8. This can be confirmed by the term number used by nodeA for dry-run election, 8.

      Assuming that nothing went wrong in starting the election, I believe this loop can happen when a node waits indefinitely for voteRequests after starting a real election. I couldn't find any information related to this.

      Attachments

        1. node5-1.log
          3.36 MB
        2. node5-2.log
          1.24 MB
        3. node5-arb.log
          370 kB

        Activity

          People

            william.schultz@mongodb.com William Schultz (Inactive)
            shredder12 Shashank Sahni
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: