[SERVER-60291] Resharding Prohibited Commands Does Not Wait For The Recipient To Be Done Created: 28/Sep/21  Updated: 29/Oct/23  Resolved: 05/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Steps To Reproduce:
  1. ReshardingRecipientService marks itself as done
  2. Put a sleep on the ReshardingCoordinatorService such that it never tells the participants to refresh
  3. Run the resharding_prohibited_commands.js test
  4. Because it relies on the cache.collections it will continue as if the recipient has finished. But because the CollectionMetadata hasn't refreshed, it will not allow the operation in the second postDecisionPersisted to go through
Sprint: Sharding 2021-10-04, Sharding 2021-10-18
Participants:
Linked BF Score: 35
Story Points: 1

 Description   

Context
When the ReshardingRecipientService, marks itself as done. It will update it's local in memory representation of the state. But that will not lead to the rest of the recipient's in memory state to know that the recipient is finished. This is because it uses the CollectionMetadata to know the state of the operation.

The recipient's CollectionMetadata isn't updated until the coordinator sends a commands to the participants of the resharding operation to refresh.

So any calls to throwIfReshardingInProgress will not necessarily reflect the state of the ReshardingRecipientService but instead reflect the state of the recipient since the last refresh.

The Problem

The resharding_prohibited_commands.js relies on updates to the cache.collections collection (which has caused problems in the past, see SERVER-59694), it will allow all of the commands that are prohibited during resharding to be executed before the CollectionMetadata of the recipient has been updated.

Hence when the colldMod command was executed the throwIfReshardingInProgress function received reshardingFields that have not been updated to reflect the move to done by the ReshardingRecipientService.

Possible Solution
The test needs to be updated to not rely on the cache.collections as that doesn't have a strong relation to when the recipient is actually done.

We should either join the resharding operation and wait until it is done. Or take some other measure to determine whether or not the recipient is fully "done".



 Comments   
Comment by Githook User [ 07/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-60291 Move success case to outside of withReshardingInBackground
Branch: v5.0
https://github.com/mongodb/mongo/commit/56ea1c7d56fd520a99e4dc386fbb57af1b51ba7e

Comment by Githook User [ 06/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-60291 Move success case to outside of withReshardingInBackground
Branch: v5.1
https://github.com/mongodb/mongo/commit/dded3f27cb82437472885c8555c302e09e114e1e

Comment by Githook User [ 05/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-60291 Move success case to outside of withReshardingInBackground
Branch: master
https://github.com/mongodb/mongo/commit/6d94bfbcecaed86ba2d9e9491a1f68dd34d1fc6c

Comment by Luis Osta (Inactive) [ 01/Oct/21 ]

So after talking with janna.golden, it became clear that just waiting for the resharding operation to complete won't actually maintain the same contract the test is supposed to have. This tests that the commands should be allowed right after the coordinator persists the decision even if the full operation hasn't completed yet.

Generated at Thu Feb 08 05:49:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.