Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47263

Inaccurate log message when losing dry run election

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.2.3
    • Fix Version/s: 4.2.10, 4.4.1, 4.7.0
    • Component/s: Replication
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2, v4.0
    • Sprint:
      Repl 2020-06-01, Repl 2020-06-15, Repl 2020-06-29

      Description

      When a node loses a dry run election, it logs the message

      Lost dry run election due to internal error
      

      This is misleading, at least in the cases I saw - it loses the dry run election for a pretty standard reason (new term started, not enough votes). This is not really an "internal error", just the mechanics of elections.

      This also applies with LOGV2, where we have "Lost election due to internal error" with an "isDryRun" attribute, but it's still misleading. This is usually not (is never?) an internal error:
      https://github.com/mongodb/mongo/blob/edb8778350326d2b33f056b1b5f0b25a4b5b444a/src/mongo/db/repl/replication_coordinator_impl_elect_v1.cpp#L58

      Here are two examples of this error message

      2020-03-26T17:53:53.115+0000 I  ELECTION [replexec-37] Starting an election, since we've seen no PRIMARY in the past 5000ms
      2020-03-26T17:53:53.115+0000 I  ELECTION [replexec-37] conducting a dry run election to see if we could be elected. current term: 132
      2020-03-26T17:53:53.116+0000 I  REPL     [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683059 -- target:myreplset-00-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } }
      2020-03-26T17:53:53.116+0000 I  REPL     [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683060 -- target:myreplset-02-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } }
      ...
      2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-38] VoteRequester(term 132 dry run) received a no vote from myreplset-00-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) }
      2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] VoteRequester(term 132 dry run) received a no vote from myreplset-02-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) }
      2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] not running for primary, we received insufficient votes
      2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] Lost dry run election due to internal error
      

      2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] VoteRequester(term 132 dry run) failed to receive response from myreplset-02-abcde.mongodb.net:27017: NetworkInterfaceExceededTimeLimit: Request 1115132387 timed out, deadline was 2020-03-27T02:37:15.225+0000, op was RemoteCommand 1115132387 -- target:[myreplset-02-abcde.mongodb.net:27017] db:admin expDate:2020-03-27T02:37:15.225+0000 cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585276624, 7), t: 132 } }
      2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] not running for primary, we have been superseded already during dry run. original term: 132, current term: 133
      2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] Lost dry run election due to internal error
      

        Attachments

          Activity

            People

            Assignee:
            pridhvi.vegesna Pridhvi Vegesna (Inactive)
            Reporter:
            joanna.cheng Joanna Cheng
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: