[SERVER-44330] Investigate the behavior of concurrent replSetInitiate on different nodes Created: 31/Oct/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

Running replSetInitiate commands concurrently against different nodes may split the nodes in a replset into two groups since the config versions are both 1 and different no-op oplog entries will be written on the nodes that run the commands. We need to investigate its consequence. It's acceptable if one group will eventually roll back their no-op and sync from the other, or the nodes fail loudly. It would be worse if no primary can be elected. We should prevent more than one primaries in the same term, rolling back majority committed data or inconsistency between data and the oplog.

Similar questions exist when the configs of concurrent replSetInitiate commands share some but not all nodes.

This scenario will be extremely rare, once in the whole lifetime of a replset. This also means a user intentionally runs two replSetInitiate commands against two nodes concurrently. It might only be possible with a misconfigured script.


Generated at Thu Feb 08 05:05:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.