-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.3.0-rc0, 8.2.0
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
馃煩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
When downgrading the FCV to 8.0 we abort ongoing chunk migrations. Just like SERVER-92484, if the chunk migration gets aborted after it was committed but before the corresponding range deletion got deleted, the range deletion will remain in "pending:true" status so its orphans won't get cleaned up until the migration is lazily recovered (either on the next query on that namespace (if any) or the next step-up).
聽
SERVER-103749 implemented a workaround for CheckOrphansDeleted getting stuck by forcing migration recovery in jstests but it doesn't always work: The migration coordinator document is on the donor shard, but the pending range deletion is also on the recipient shard, which may get checked first.
聽
We may want to review the FCV downgrade code or the workaround to avoid this issue.
聽
A reproducer is attached.
- is related to
-
SERVER-92484 Killing chunk migration session after commit can leave uncleaned orphan documents
-
- Blocked
-
-
SERVER-103749 CheckOrphansAreDeletedHelpers must account for lazy recovery of unfinished migrations
-
- Closed
-
- related to
-
SERVER-107142 The recovery of chunk migrations may cause a server crash of the donor shard when nodes run mixed binaries
-
- Closed
-