[SERVER-53931] Investigate how to cancel recipients cloning/applying in resharding Created: 20/Jan/21 Updated: 29/Oct/23 Resolved: 15/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Haley Connelly | Assignee: | Haley Connelly |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-error-flow | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sharding 2021-02-22, Sharding 2021-03-08, Sharding 2021-03-22 | ||||||||||||
| Participants: | |||||||||||||
| Story Points: | 1 | ||||||||||||
| Description |
|
Goal: determine how to effectively interrupt recipients cloning/applying in resharding. It may be useful to look more into cancellation tokens and whether those could be used for such a task. |
| Comments |
| Comment by Githook User [ 15/Mar/21 ] |
|
Author: {'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}Message: |
| Comment by Haley Connelly [ 02/Mar/21 ] |
|
This ticket will now focus on making sure cancel tokens are passed and periodically checked for the cloner, oplog applier, txn cloner, and the oplog fetcher. |
| Comment by Haley Connelly [ 09/Feb/21 ] |
|
For now, there isn't a POC with this because priorities have shifted elsewhere. |
| Comment by Haley Connelly [ 09/Feb/21 ] |
|
Summary of plan for cancellation tokens: The idea is that resharding state machines will have 2 cancellation tokens. A posToken, the token passed in from the PrimaryOnlyService instances run method (eg ReshardingCoordinatorService::ReshardingCoordinator::run() ), and an abortToken, a token derived from a cancelation source that takes in the posToken. When there is a stepdown, the posToken will be canceled. When there is an unrecoverable error, the resharding instance will cancel the abortToken. Similar to checkIfReceivedDonorAbortMigration() in tenant_migration, resharding should have a method that differentiates between an unrecoverable error versus a recoverable error via the tokens. Recoverable error (failover/ stepdown) Unrecoverable error (abort resharding operation entirely)
|
| Comment by Max Hirschhorn [ 21/Jan/21 ] |
|
Marking this as 1 point so that no more than 1 week is spent on it before reporting findings to the group. |