[SERVER-45274] priorPrimaryMemberId electionCandidateMetric should track most recent non-empty primary seen Created: 20/Dec/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-45493 temporarily disable failing assertion... Closed
Assigned Teams:
Replication
Operating System: ALL
Backport Requested:
v4.2, v4.0
Participants:

 Description   

It assumes that on a stepdown with election handoff the new primary will run for election before hearing that the old primary stepped down. This is racy and prone to failure.



 Comments   
Comment by Judah Schvimer [ 09/Jan/20 ]

Yes.

Comment by Evin Roesle [ 09/Jan/20 ]

It would be ideal for us to know the prior primary ID even in the case that there are no current primaries visible. I want to be able to determine how often network partitions happen, looking at the prior primary ID and checking on the health of that node, whether it crashed, was down, or just wasn't visible to the node running. Does this ticket mean we can't get that information today?

Comment by Githook User [ 20/Dec/19 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah.schvimer@10gen.com', 'username': 'judahschvimer'}

Message: SERVER-45274 improve logging in election_candidate_and_participant_metrics.js
Branch: master
https://github.com/mongodb/mongo/commit/c4d21d70c7ae90d1ef8635ff81cfc58d8926c770

Comment by Judah Schvimer [ 20/Dec/19 ]

Note for schedulers: waiting on Aly's response to the above to determine if the cpp behavior is fine and we can just adjust the test or if we need to change the cpp behavior to reflect what the test expects.

Comment by Judah Schvimer [ 20/Dec/19 ]

alyson.cabral, this test failed because the "priorPrimaryMemberId" field in replSetGetStatus "electionCandidateMetrics" represents the primary at the time the election was called, NOT the last primary the node was aware of. If there is any period of 0 primaries between the previous election and the new election, then the "priorPrimaryMemberId" field will be missing. Is this behavior acceptable, desired, or undesired?

Generated at Thu Feb 08 05:08:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.