[SERVER-40859] Orphaned collections after a movePrimary that fails to cleanup can cause future migrations to fail Created: 26/Apr/19 Updated: 13/Jan/23 Resolved: 13/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.1.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | moveprimary-resiliency, pm-1051-legacy-tickets, sharding-causes-bfs-hard | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||
| Description |
|
If the movePrimary command is interrupted after it successfully runs configsvrCommitMovePrimary but before it finishes dropping all the cloned collections, they can cause errors when chunks for those collections eventually get moved to this shard. This is because movePrimary assigns new UUID to the cloned collections and the UUIDs for the "orphaned" collections and the new sharded collections will be different. |
| Comments |
| Comment by Tommaso Tocci [ 13/Jan/23 ] |
|
This problem have been fixed by |
| Comment by Antonio Fuschetto [ 12/Jan/23 ] |
|
The problem here described is resolved with the resilient movePrimary, which will be introduced in 7.0 version. In case of interruption before to drop the cloned collections, the DDL coordinator resumes the failed phase thus ensuring the removal of the local stale data. Currently there is no plan to backport the resilient movePrimary, so lower versions are still affected by the problem. |
| Comment by Kaloian Manassiev [ 13/Jul/22 ] |
|
CC antonio.fuschetto@mongodb.com who I believe is working in that area now. |
| Comment by Tommaso Tocci [ 13/Jul/22 ] |
|
According to kaloian.manassiev@mongodb.com this will be fixed as part of PM-2144, in which the movePrimary coordinator will be restructure to be resilient to stepdowns |