[SERVER-31922] Make the migration chunk cloner source resilient to stepdowns and network errors Created: 10/Nov/17 Updated: 06/Dec/22 Resolved: 29/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 20 | ||||||||||||
| Description |
|
On the donor shard, the MigrationChunkClonerSourceLegacy::startClone code uses _callRecipient private class function to call the recipient, which then uses the task executor to make the call. The task executor does not retry NotMaster errors. A solution would be to use a ShardRemote, instead, and allow NotMaster errors to be retried for that first command, _recvChunkStart – don't want to use it for all commands, but the first one is safe, I think. |
| Comments |
| Comment by Ratika Gandhi [ 29/Jul/19 ] |
|
Low on priority. Please reopen if this is required. |