Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0, 7.0.13, 8.0.0-rc14
Affects Version/s: 7.3.3, 7.0.12, 8.0.0-rc10
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.0, v7.3, v7.0
Sprint:
CAR Team 2024-07-08, CAR Team 2024-07-22
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When a chunk migration happens, on the recipient side we wait for ongoing range deletions on overlapping ranges before persisting a range deletion document.

But on the donor side we assume that no range deletion document exists locally for the range being moved.

That's a wrong assumption because the following could happen:

A migration from shardA to shardB commits on the CSRS but gets interrupted right before deleting the pending range deletion task on the recipient shard. Shard A will need to recover it and set things right on the range deletion document.
shardB migrates to shardC the newly received chunk shard right away, flagging a range deletion task matching the moved range as ready (relying on collection uuid + boundaries). This can result in flagging as ready the pending range deletion task from the migration happened at (1).
shardA recovers the migration and deletes the range deletion task on shardB (relying on migration id + collection id + boundaries). This is a no-op because of (2).

As a result, the range deletion task for the migration that happened at (2) will stay flagged as pending forever.

The consequence is that no range overlapping with the pending range deletion task will be ever moved back. In the worse case, this may result in the balancer be unable to migrate chunks from shardC to shardB due to the chunk selection policy (always picking the lower chunk from the donor shard).

is caused by

SERVER-69586 Make update/delete of range deletion document on recipient idempotent

Closed

is related to

SERVER-92381 Ensure MigrationSourceManager fulfills its promise when aborting in early stages

Closed

Assignee:: Paolo Polato
Reporter:: Pierlauro Sciarelli
Participants:: Githook User, Paolo Polato, Pierlauro Sciarelli
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jun 28 2024 08:33:16 AM UTC
Updated:: Jul 30 2024 09:29:10 AM UTC
Resolved:: Jul 09 2024 08:02:21 AM UTC
Confidence Status Last Update:: 01/Jul/24 8:01 AM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates