[SERVER-51132] Ensure that resharding participants have removed all disk metadata after having completed their portion of the resharding operation Created: 24/Sep/20  Updated: 29/Oct/23  Resolved: 07/Dec/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Janna Golden Assignee: Alexander Taskov (Inactive)
Resolution: Fixed Votes: 0
Labels: PM-234-M2, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-11-30, Sharding 2020-12-14
Participants:
Story Points: 1

 Description   

This ticket will only involve making sure donors/recipients/coordinators have removed all disk metadata after having completed the resharding operation.

This should include:

  • Removing the reshardingFields parameter from the original config.collections entry on the coordinator after the coordinator has transitioned to kDone.
  • Removing the config.reshardingOperations document for the current resharding operation after the coordinator has transitioned to kDone.
  • Removing the config.localReshardingOperations documents on for the current resharding operation on recipients/donors after the recipients/donors have transitioned to kDone.

Note that for this ticket and Milestone 2 in general, you don't need to worry about handling any error state cleanup.



 Comments   
Comment by Githook User [ 07/Dec/20 ]

Author:

{'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com', 'username': 'alextaskov'}

Message: SERVER-51132 Ensure that resharding participants have removed all disk metadata after having completed their portion of the resharding operation
Branch: master
https://github.com/mongodb/mongo/commit/5c6a908f1a889464d667d89f429455e9ce9248ed

Comment by Haley Connelly [ 17/Nov/20 ]

Nope, good to go!

Comment by Blake Oler [ 13/Nov/20 ]

haley.connelly I'm moving this over to alex.taskov. If you have any in-progress work, could you post it in a code review?

Comment by Haley Connelly [ 05/Nov/20 ]

While investigating this ticket, I found the following bug and filed SERVER-52653 

When the ReshardingCoordinatorService tries to persist its transition to kDone, it calls it tries to do so via resharding::persistStateTransitionAndCatalogUpdatesThenBumpShardVersions. However, at this point, we do not want to bump the shard version of the collection (see notifyForStateTransition) and will hit an invariant.

Our current testing does not catch this because it tests the transition to kDone by calling removeCoordinatorDocAndReshardingFields, which is a function only called by the test that no longer accurately mirrors the coordinator's code flow since SERVER-51291

 

Comment by Blake Oler [ 03/Nov/20 ]

The ticket has been updated to reflect what the final decision on this was.

Comment by Janna Golden [ 09/Oct/20 ]

This came up in a conversation with max.hirschhorn a couple of weeks ago. I agree that the participants should clean up when they transition to done, but the coordinator should be the one to tell them to transition to done (this is important for recovery) - this should happen only once the coordinator is sure that all recipients have successfully renamed the collection and all donors have successfully dropped the collection.

Comment by Blake Oler [ 09/Oct/20 ]

janna.golden I'm not sure if this ticket is necessary. The participant shards will clean themselves up as part of themselves transitioning to done, the recipients after they've renamed and the donors after they've dropped. So as it currently stands, we have already satisfied that shards will clean up before the coordinator begins cleaning up.

Generated at Thu Feb 08 05:24:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.