[SERVER-62720] _configsvrReshardCollection can fail to join existing operation Created: 18/Jan/22  Updated: 29/Oct/23  Resolved: 20/Sep/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Brett Nawrocki Assignee: Abdul Qadeer
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-70746 _configsvrReshardCollection Will Not ... Closed
Related
related to SERVER-78604 ReshardingCoordinatorService Index bu... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1, v6.0, v5.0
Sprint: Sharding 2022-08-08, Sharding 2022-08-22, Sharding 2022-09-05, Sharding 2022-09-19, Sharding 2022-10-03
Participants:
Linked BF Score: 60
Story Points: 4

 Description   

Currently, the _configsvrReshardCollection relies on its own getExistingInstanceToJoin() function to determine if an existing ReshardingCoordinator is executing the same resharding operation. Because this function relies on getting and iterating existing instances of ReshardingCoordinator instead of checking atomically by overriding checkIfConflictsWithOtherInstances(), it's possible that if two identical _configsvrReshardCollection commands execute in quick succession (e.g. due to an election on the primary shard for the database) that they both see no existing coordinators and proceed to create one.

As seen in BF-23979, this will manifest as a Location5808201 error for the coordinator that loses the race to set allowMigrations to false.



 Comments   
Comment by Githook User [ 20/Sep/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-62720 Implement checkIfConflictsWithOtherInstances for ReshardingCoordinatorService
Branch: master
https://github.com/mongodb/mongo/commit/e782061186dae5a650d8d28a80e4c4e92051a528

Comment by Githook User [ 10/Sep/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: Revert "SERVER-62720 Implement checkIfConflictsWithOtherInstances for ReshardingCoordinatorService"

This reverts commit 08a4e5e3e0e157e543b2b77f285560361e23a49f.
Branch: master
https://github.com/mongodb/mongo/commit/9fb29bd06a20abd76b956025757d64bafffec0d1

Comment by Githook User [ 07/Sep/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-62720 Implement checkIfConflictsWithOtherInstances for ReshardingCoordinatorService
Branch: master
https://github.com/mongodb/mongo/commit/08a4e5e3e0e157e543b2b77f285560361e23a49f

Generated at Thu Feb 08 05:55:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.