[SERVER-47263] Inaccurate log message when losing dry run election Created: 02/Apr/20  Updated: 29/Oct/23  Resolved: 25/Jun/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.2.3
Fix Version/s: 4.4.1, 4.7.0, 4.2.10

Type: Bug Priority: Major - P3
Reporter: Joanna Cheng Assignee: Pridhvi Vegesna (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image.png    
Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Repl 2020-06-01, Repl 2020-06-15, Repl 2020-06-29
Participants:

 Description   

When a node loses a dry run election, it logs the message

Lost dry run election due to internal error

This is misleading, at least in the cases I saw - it loses the dry run election for a pretty standard reason (new term started, not enough votes). This is not really an "internal error", just the mechanics of elections.

This also applies with LOGV2, where we have "Lost election due to internal error" with an "isDryRun" attribute, but it's still misleading. This is usually not (is never?) an internal error:
https://github.com/mongodb/mongo/blob/edb8778350326d2b33f056b1b5f0b25a4b5b444a/src/mongo/db/repl/replication_coordinator_impl_elect_v1.cpp#L58

Here are two examples of this error message

2020-03-26T17:53:53.115+0000 I  ELECTION [replexec-37] Starting an election, since we've seen no PRIMARY in the past 5000ms
2020-03-26T17:53:53.115+0000 I  ELECTION [replexec-37] conducting a dry run election to see if we could be elected. current term: 132
2020-03-26T17:53:53.116+0000 I  REPL     [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683059 -- target:myreplset-00-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } }
2020-03-26T17:53:53.116+0000 I  REPL     [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683060 -- target:myreplset-02-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } }
...
2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-38] VoteRequester(term 132 dry run) received a no vote from myreplset-00-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) }
2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] VoteRequester(term 132 dry run) received a no vote from myreplset-02-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) }
2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] not running for primary, we received insufficient votes
2020-03-26T17:53:53.121+0000 I  ELECTION [replexec-39] Lost dry run election due to internal error

2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] VoteRequester(term 132 dry run) failed to receive response from myreplset-02-abcde.mongodb.net:27017: NetworkInterfaceExceededTimeLimit: Request 1115132387 timed out, deadline was 2020-03-27T02:37:15.225+0000, op was RemoteCommand 1115132387 -- target:[myreplset-02-abcde.mongodb.net:27017] db:admin expDate:2020-03-27T02:37:15.225+0000 cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585276624, 7), t: 132 } }
2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] not running for primary, we have been superseded already during dry run. original term: 132, current term: 133
2020-03-27T02:37:15.225+0000 I  ELECTION [replexec-52] Lost dry run election due to internal error



 Comments   
Comment by Githook User [ 20/Aug/20 ]

Author:

{'name': 'PV99', 'email': 'pridhvi.vegesna@mongodb.com', 'username': 'PV99'}

Message: SERVER-47263 dryrun election failure logging - removed inaccurate portion of log message

(cherry picked from commit e24e32fc7e4fc69c856ba8999163c1aa40d32b4c)
Branch: v4.4
https://github.com/mongodb/mongo/commit/9cd84ed4054e0bf718faa76a745a427c5aabebee

Comment by Githook User [ 13/Aug/20 ]

Author:

{'name': 'PV99', 'email': 'pridhvi.vegesna@mongodb.com', 'username': 'PV99'}

Message: SERVER-47263 dryrun election failure logging - removed inaccurate portion of log message

(cherry picked from commit e24e32fc7e4fc69c856ba8999163c1aa40d32b4c)
Branch: v4.2
https://github.com/mongodb/mongo/commit/7365cfc9e7b0079fe97bcbf5dbc0c423b6c97c6d

Comment by Githook User [ 25/Jun/20 ]

Author:

{'name': 'PV99', 'email': 'pridhvi.vegesna@mongodb.com'}

Message: SERVER-47263 dryrun election failure logging - removed inaccurate portion of log message
Branch: master
https://github.com/mongodb/mongo/commit/e24e32fc7e4fc69c856ba8999163c1aa40d32b4c

Generated at Thu Feb 08 05:13:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.