[SERVER-64939] Minimize shard split duration by sending a step up command to secondary Created: 25/Mar/22 Updated: 29/Oct/23 Resolved: 16/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Matt Broadstone | Assignee: | Didier Nadeau |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Server Serverless 2022-05-02, Server Serverless 2022-05-16, Server Serverless 2022-05-30 | ||||||||
| Participants: | |||||||||
| Description |
|
In order to minimize the duration of shard split we want to manually trigger an election to avoid waiting for the election timeout. The shard split service will send a `replSetStepUp` command to one of the nodes to ensure a primary will be elected as soon as possible. If the step up fails, it will select another node and send it again. One optimization to this method would be to disable replication at the same time for recipient node, to ensure they all have the same oplog and the replSetStepUp succeed. It was deemed too complicated for now and the idea was put aside (see Previous context for more info). Previous context : After We can use the split state document as this tombstone: if the state is kBlocking and the current node is tagged with recipientTagName, then pause replication on this node. Once a new primary is elected, reenable replication. Note, we may still need to clear the sync state to ensure that when replication is restarted, it's not started syncing from one of the donor nodes. Some additional benefits to this approach:
|
| Comments |
| Comment by Githook User [ 13/May/22 ] |
|
Author: {'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}Message: |