-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Labels:None
-
Fully Compatible
-
Server Serverless 2022-05-02, Server Serverless 2022-05-16, Server Serverless 2022-05-30
In order to minimize the duration of shard split we want to manually trigger an election to avoid waiting for the election timeout. The shard split service will send a `replSetStepUp` command to one of the nodes to ensure a primary will be elected as soon as possible. If the step up fails, it will select another node and send it again.
One optimization to this method would be to disable replication at the same time for recipient node, to ensure they all have the same oplog and the replSetStepUp succeed. It was deemed too complicated for now and the idea was put aside (see Previous context for more info).
Previous context :
After SERVER-64935 we will send a replSetStepUp command to a random recipient node in order to run an immediate election. It's possible that this node will lose the election if its replication state is older than the other nodes, meaning we might need to retry the election against another node. In order to ensure that any selected recipient node is electable, we should pause replication on the recipient nodes at the same time which guarantees they have an equivalent replication state.
We can use the split state document as this tombstone: if the state is kBlocking and the current node is tagged with recipientTagName, then pause replication on this node. Once a new primary is elected, reenable replication. Note, we may still need to clear the sync state to ensure that when replication is restarted, it's not started syncing from one of the donor nodes.
Some additional benefits to this approach:
- Recipient nodes will not need to perform replication rollback after the election
- We will prevent unnecessary replication traffic for data that will be deleted during orphan cleanup after the split operation completes
- is duplicated by
-
SERVER-64935 Send replSetStepUp to a random recipient node after installing the split config
- Closed