Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.2.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-nyc-subteam1
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.1, v6.0, v5.0
Sprint:
Sharding 2022-08-08, Sharding 2022-08-22, Sharding 2022-09-05, Sharding 2022-09-19, Sharding 2022-10-03
Linked BF Score:
60
Story Points:
4
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently, the _configsvrReshardCollection relies on its own getExistingInstanceToJoin() function to determine if an existing ReshardingCoordinator is executing the same resharding operation. Because this function relies on getting and iterating existing instances of ReshardingCoordinator instead of checking atomically by overriding checkIfConflictsWithOtherInstances(), it's possible that if two identical _configsvrReshardCollection commands execute in quick succession (e.g. due to an election on the primary shard for the database) that they both see no existing coordinators and proceed to create one.

As seen in BF-23979, this will manifest as a Location5808201 error for the coordinator that loses the race to set allowMigrations to false.

causes

SERVER-70746 _configsvrReshardCollection Will Not Join Existing Operations After Shard Key is Updated

Closed

related to

SERVER-78604 ReshardingCoordinatorService Index build deadlocks with OpObserver

Backlog

Assignee:: Abdul Qadeer
Reporter:: Brett Nawrocki
Participants:: Abdul Qadeer, Brett Nawrocki, Githook User
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jan 18 2022 05:43:45 PM UTC
Updated:: Oct 29 2023 09:44:00 PM UTC
Resolved:: Sep 20 2022 08:55:12 PM UTC
Confidence Status Last Update:: 14/Sep/22 3:22 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates