[SERVER-51132] Ensure that resharding participants have removed all disk metadata after having completed their portion of the resharding operation Created: 24/Sep/20 Updated: 29/Oct/23 Resolved: 07/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Janna Golden | Assignee: | Alexander Taskov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M2, PM-234-T-lifecycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Sharding 2020-11-30, Sharding 2020-12-14 |
| Participants: | |
| Story Points: | 1 |
| Description |
|
This ticket will only involve making sure donors/recipients/coordinators have removed all disk metadata after having completed the resharding operation. This should include:
Note that for this ticket and Milestone 2 in general, you don't need to worry about handling any error state cleanup. |
| Comments |
| Comment by Githook User [ 07/Dec/20 ] |
|
Author: {'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com', 'username': 'alextaskov'}Message: |
| Comment by Haley Connelly [ 17/Nov/20 ] |
|
Nope, good to go! |
| Comment by Blake Oler [ 13/Nov/20 ] |
|
haley.connelly I'm moving this over to alex.taskov. If you have any in-progress work, could you post it in a code review? |
| Comment by Haley Connelly [ 05/Nov/20 ] |
|
While investigating this ticket, I found the following bug and filed When the ReshardingCoordinatorService tries to persist its transition to kDone, it calls it tries to do so via resharding::persistStateTransitionAndCatalogUpdatesThenBumpShardVersions. However, at this point, we do not want to bump the shard version of the collection (see notifyForStateTransition) and will hit an invariant. Our current testing does not catch this because it tests the transition to kDone by calling removeCoordinatorDocAndReshardingFields, which is a function only called by the test that no longer accurately mirrors the coordinator's code flow since
|
| Comment by Blake Oler [ 03/Nov/20 ] |
|
The ticket has been updated to reflect what the final decision on this was. |
| Comment by Janna Golden [ 09/Oct/20 ] |
|
This came up in a conversation with max.hirschhorn a couple of weeks ago. I agree that the participants should clean up when they transition to done, but the coordinator should be the one to tell them to transition to done (this is important for recovery) - this should happen only once the coordinator is sure that all recipients have successfully renamed the collection and all donors have successfully dropped the collection. |
| Comment by Blake Oler [ 09/Oct/20 ] |
|
janna.golden I'm not sure if this ticket is necessary. The participant shards will clean themselves up as part of themselves transitioning to done, the recipients after they've renamed and the donors after they've dropped. So as it currently stands, we have already satisfied that shards will clean up before the coordinator begins cleaning up. |