[SERVER-40859] Orphaned collections after a movePrimary that fails to cleanup can cause future migrations to fail Created: 26/Apr/19  Updated: 13/Jan/23  Resolved: 13/Jan/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.1.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Duplicate Votes: 0
Labels: moveprimary-resiliency, pm-1051-legacy-tickets, sharding-causes-bfs-hard
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-71202 Adapt existing movePrimary logic to n... Closed
Related
related to SERVER-54231 Resharding can leave behind local col... Closed
Assigned Teams:
Sharding EMEA
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

If the movePrimary command is interrupted after it successfully runs configsvrCommitMovePrimary but before it finishes dropping all the cloned collections, they can cause errors when chunks for those collections eventually get moved to this shard. This is because movePrimary assigns new UUID to the cloned collections and the UUIDs for the "orphaned" collections and the new sharded collections will be different.



 Comments   
Comment by Tommaso Tocci [ 13/Jan/23 ]

This problem have been fixed by SERVER-71202. Please defer to SERVER-71202 for any other information about the fix.

Comment by Antonio Fuschetto [ 12/Jan/23 ]

The problem here described is resolved with the resilient movePrimary, which will be introduced in 7.0 version. In case of interruption before to drop the cloned collections, the DDL coordinator resumes the failed phase thus ensuring the removal of the local stale data.

Currently there is no plan to backport the resilient movePrimary, so lower versions are still affected by the problem.

Comment by Kaloian Manassiev [ 13/Jul/22 ]

CC antonio.fuschetto@mongodb.com who I believe is working in that area now.

Comment by Tommaso Tocci [ 13/Jul/22 ]

According to kaloian.manassiev@mongodb.com this will be fixed as part of PM-2144, in which the movePrimary coordinator will be restructure to be resilient to stepdowns

Generated at Thu Feb 08 04:56:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.