[SERVER-66235] Clear sync source and buffer when applying recipient config during a shard split Created: 05/May/22 Updated: 29/Oct/23 Resolved: 10/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Didier Nadeau | Assignee: | Didier Nadeau |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
When the recipient nodes apply the split config to join the new replica set, they don't immediately clear and reset their sync source and the buffer/fetcher/applier. The result is the nodes still receives oplog from the donor replica set for some time after joining the recipient replica set, even after a primary has been elected in the recipient. This was discovered through the following bug :
Fix : On every batch of message `ReplicationCoordinatorImpl::shouldChangeSyncSource` is invoqued. When it receives the first batch following the split reconfig, this methods returns hangeSyncSourceAction::kStopSyncingAndEnqueueLastBatch. As the name implies it reset the syncSource, but it process the batch of message received. This makes sense in a normal reconfig/primary change as the node would receive committed oplog, therefore it should apply these oplogs even if the primary changed (the new primary would normally have these oplogs too). However for a split we want a clean break and don't want any oplog after the reconfig to be applied on the recipient. Therefore ReplicationCoordinatorImpl::shouldChangeSyncSource should return ChangeSyncSourceAction::kStopSyncingAndDropLastBatchIfPresent. |
| Comments |
| Comment by Githook User [ 10/May/22 ] |
|
Author: {'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}Message: |