[SERVER-47948] Replica set reconfig quorum check should compare configs based on version and term Created: 04/May/20  Updated: 29/Oct/23  Resolved: 14/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.5.1, 4.4.0-rc3
Fix Version/s: 4.4.0-rc7, 4.7.0

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: safe-reconfig-related
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-48776 Remove config version and term check ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Steps To Reproduce:

load("jstests/libs/fail_point_util.js");
 
// Test quorum check when the target node has same config version but lower config term.
 
let rst = new ReplSetTest({nodes: 3, useBridge: true});
rst.startSet();
rst.initiate();
 
let primary = rst.getPrimary();
let coll = primary.getDB("test")["test"];
let config = rst.getReplSetConfigFromNode(0);
let origVersion = config.version;
 
// Isolate the current primary (node 0) and block reconfigs from completing on node 1.
jsTestLog("Isolating the old primary.");
primary.disconnect([rst.nodes[2]]);
let fp1 = configureFailPoint(rst.nodes[1], "blockHeartbeatReconfigFinish");
 
// Do a no-op reconfig on the stale primary to advance the config version. We expect this will
// timeout while waiting for the config to commit since the primary is isolated and node 1 is not
// accepting new configs. Node 1 needs to be connected to node 0 for the quorum check to pass,
// though.
config.version = origVersion + 1;
jsTestLog("Reconfig on old primary to advance config version to: " + config.version);
let res = primary.adminCommand({replSetReconfig: config, maxTimeMS: 2000});
assert.commandFailedWithCode(res, ErrorCodes.MaxTimeMSExpired);
 
// Now disconnect the primary from node 1 so it's config does not propagate to it. Then re-enable
// node 1's ability to receive new configs.
primary.disconnect([rst.nodes[1]]);
 
// Block node 0 from accepting new configs via heartbeat.
let fp2 = configureFailPoint(rst.nodes[0], "blockHeartbeatReconfigFinish");
 
// Step up node 2 so it writes a new config with an incremented term but the same version.
jsTestLog("Stepping up node 2.");
assert.soonNoExcept(() => {
    assert.commandWorked(rst.nodes[2].adminCommand({replSetStepUp: 1}));
    return true;
});
// Wait until primary is writable.
assert.soonNoExcept(() => {return rst.nodes[2].adminCommand({isMaster: 1}).ismaster});
 
// Let node 1 now install the original new config from the old primary.
fp1.off();
 
// Wait until the config that was written on step up is committed.
assert.soon(() => isConfigCommitted(rst.nodes[2]));
 
// Disconnect node 1 from node 2 so it cannot satisfy quorum check.
rst.nodes[1].disconnect(rst.nodes[2]);
 
// Connect the current primary to the stale primary so it needs it for the quorum check.
jsTestLog("Reconnecting old primary to new primary");
primary.reconnect([rst.nodes[2]]);
 
jsTestLog("Doing a reconfig on node 2");
config = rst.getReplSetConfigFromNode(2);
config.version = origVersion + 1;
 
assert.commandWorked(rst.nodes[2].adminCommand({replSetReconfig: config}));
primary.reconnect([rst.nodes[1], rst.nodes[2]]);
fp2.off();
rst.stopSet();

Sprint: Repl 2020-05-18
Participants:
Linked BF Score: 32

 Description   

Currently, when executing the quorum check for reconfig, we only compare config versions to determine if the sender's config is newer than the receiver's config. This can lead to an erroneous error if the sender's config is actually newer based on term, even if it has the same version. We should update this comparison to consider both config version and term.



 Comments   
Comment by Githook User [ 19/May/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47948 Replica set reconfig quorum check should compare configs based on version and term

(cherry picked from commit 9c1d33e0d3917fa28d471638ceabff5861b27d1f)
Branch: v4.4
https://github.com/mongodb/mongo/commit/93bd3551c899fc374d3a5088305d1b7437a1b38f

Comment by Githook User [ 13/May/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47948 Replica set reconfig quorum check should compare configs based on version and term
Branch: master
https://github.com/mongodb/mongo/commit/9c1d33e0d3917fa28d471638ceabff5861b27d1f

Generated at Thu Feb 08 05:15:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.