[SERVER-46387] Only vote for candidate with same config version and term as self Created: 25/Feb/20  Updated: 29/Oct/23  Resolved: 17/Mar/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.3.3
Fix Version/s: 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-46667 Avoid invariant from invalid candidat... Closed
Problem/Incident
causes SERVER-57262 Allow nodes to vote for candidates wi... Closed
Related
related to SERVER-45080 Voters should reject a candidate if t... Closed
related to SERVER-47430 Update TLA+ to only vote for candidat... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Steps To Reproduce:

load("jstests/libs/fail_point_util.js");  // For configureFailPoint.
//
// Test sending a vote request to a removed node.
//
 
// Start out with {n1, n2, n3}
let rst = new ReplSetTest({nodes: 3});
rst.startSet();
rst.initiate();
 
let primary = rst.getPrimary();
 
// Save the host of the node that will become removed.
let removedHost = rst.nodes[2].host;
 
// Remove n3 from the config.
let config = rst.getReplSetConfigFromNode();
let origConfig = Object.assign({}, config);
config.members = config.members.slice(0, 2);
config.version++;
assert.commandWorked(primary.adminCommand({replSetReconfig: config}));
 
// Give plenty of time for config to propagate.
sleep(5000);
 
// Block the removed secondary from installing new configs via heartbeat at this point. This is to
// simulate a case where heartbeats are propagating very slowly for some reason between nodes.
let removedConn = new Mongo(removedHost);
let fp = configureFailPoint(removedConn, "blockHeartbeatReconfigFinish");
 
// Reconfig back to the original config: {n1, n2, n3}. n3 will not hear about this yet, though,
// and still think it is REMOVED.
origConfig.version = config.version + 1;
assert.commandWorked(primary.adminCommand({replSetReconfig: origConfig}));
 
// Step down the primary and back up again. It should send a vote request to the REMOVED node.
assert.commandWorked(primary.adminCommand({replSetStepDown: 1, force: true}));
sleep(2000);
assert.commandWorked(primary.adminCommand({replSetStepUp: 1}));
rst.getPrimary();
 
rst.stopSet();

Sprint: Repl 2020-03-23
Participants:

 Description   

If we remove a secondary from a replica set config, it will enter the REMOVED state and record its selfIndex as -1. It is possible that a primary reconfigs to add this secondary back into the config and runs for a new election before this secondary learns that it is no longer REMOVED. In this scenario, it is possible for the primary to send a vote request to the REMOVED secondary, which triggers an invariant when the secondary tries to look itself up via the TopologyCoordinator::_selfConfig method. Since it is REMOVED, its selfIndex is -1 which causes us to violate this invariant.



 Comments   
Comment by Githook User [ 26/Mar/20 ]

Author:

{'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou', 'username': 'visualzhou'}

Message: SERVER-46387 Only vote for candidate with same config version and term as self.

(cherry picked from commit e063494b7379d8445bd23016819dcccbb3bd1dd3)
Branch: v4.4
https://github.com/mongodb/mongo/commit/4a8ace6c37683b8bc422ecb16d2c67d468c5eb9f

Comment by Githook User [ 17/Mar/20 ]

Author:

{'name': 'Siyuan Zhou', 'username': 'visualzhou', 'email': 'siyuan.zhou@mongodb.com'}

Message: SERVER-46387 Only vote for candidate with same config version and term as self.
Branch: master
https://github.com/mongodb/mongo/commit/e063494b7379d8445bd23016819dcccbb3bd1dd3

Comment by William Schultz (Inactive) [ 25/Feb/20 ]

I believe this may have been introduced in SERVER-45080, since in 4.2 we don't accept a vote request at all if it is from a different config.

Generated at Thu Feb 08 05:11:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.