[SERVER-32142] `movePrimary` can leave orphaned data when it aborts after cloning Created: 01/Dec/17 Updated: 06/Dec/22 Resolved: 09/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.18, 3.4.10, 3.6.0, 3.7.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | pm-1051-legacy-tickets, sharding-causes-bfs-hard | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 27 | ||||||||||||||||||||
| Description |
|
If a movePrimary command was able to clone the database but failed to complete (for example, because it stepped down), it will leave the database and the other collections in the original shard. Attempting to call the command again will do nothing because the primary database is now officially moved to the another shard, leaving the unsharded collections orphaned on the old primary shard. There's also another variant where it fails after it successfully clones, but before it updates config.databases. In this scenario, attempting to retry to command will result in the command attempting to call clone again, but fail with collection already exits. |
| Comments |
| Comment by Kaloian Manassiev [ 09/Oct/20 ] |
|
This is a deficiency of MovePrimary, which we will not address in exchange for making it use the moveChunk functionality for unsharded collections. |
| Comment by Kaloian Manassiev [ 30/Sep/20 ] |
|
Putting in Needs Triage to decide officially to close |
| Comment by Randolph Tan [ 21/Dec/17 ] |
|
Just realized there's already an existing ticket for the variant failure - |
| Comment by Randolph Tan [ 01/Dec/17 ] |
| Comment by Kaloian Manassiev [ 01/Dec/17 ] |
|
renctan, is this any different than movePrimary failing in any version prior to 3.6? |