[SERVER-62691] Remove shard does not wait for migrations to finish on the drained shard Created: 17/Jan/22 Updated: 06/Dec/22 Resolved: 18/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marcos José Grillo Ramirez | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
The removeShard command is only checking locally in the config server if the removed shard does not own any more chunks, however, this check can pass right after the latest migration commits the chunk on the config server, but has not finished the cleanup, meaning that important persistency cleanup tasks like starting the donor shard range deletion, removing the recipient shard's range deletion document document and even removing the coordinator document, might never be executed if a user shuts down the shard immediately after receiving a successful result of a removeShard command. Remove shard should check with the draining shard if all migrations are finished and successful. |
| Comments |
| Comment by Marcos José Grillo Ramirez [ 18/Jan/22 ] |
|
esha.maharishi yes, it is the same as SERVER-50144, I'll mark it as a duplicate. |
| Comment by Esha Maharishi (Inactive) [ 18/Jan/22 ] |
|
marcos.grillo, just a heads up that this may be a dupe of SERVER-50144 and/or SERVER-50146. |