Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-234-M2
- PM-234-T-lifecycle

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2021-01-25, Sharding 2021-02-08
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The ReshardingCoordinator should retrieve the donor shards for the resharding operation only once allowMigrations:false is set and/or there are some locks acquired to prevent concurrent moveChunks from occurring / succeeding in the meanwhile.

In the current code, it is possible that the ReshardingCoordinator could have an incorrect list of donors and recipients.

Consider the following scenario:

suppose we have a failpoint set to be paused in reshardCollectionCmd at this line here, after we get donor/recipients from the chunkManager, but before we even create the ReshardingCoordinatorService instance, let alone set allowMigrations:false on the original collection.
a moveChunk comes in and succeeds
we unpause the failpoint, reshardCollection begins and tries to run with donor shards who no longer actually own the data

related to

SERVER-54023 Complete TODO listed in SERVER-53330

Closed

Assignee:: Blake Oler
Reporter:: Haley Connelly
Participants:: Blake Oler, Githook User, Haley Connelly
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Dec 10 2020 08:13:22 PM UTC
Updated:: Oct 29 2023 09:59:36 PM UTC
Resolved:: Jan 25 2021 04:20:49 PM UTC
Confidence Status Last Update:: 11/Jan/21 4:59 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates