Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.1, 7.2.0-rc0, 7.0.3, 6.0.12
Affects Version/s: 6.0.6, 6.3.1, 5.0.18, 7.0.0-rc2
Component/s: Sharding
Labels:
- shardingemea-qw

Assigned Teams:

Sharding EMEA
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v7.1, v7.0, v6.0, v5.0
Sprint:
Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02, Sharding EMEA 2023-10-16
Linked BF Score:
0
Story Points:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When stopping migrations on a sharded collection being renamed, the flow leads to a refresh on every shard in order for them to discover the stopMigrations flag and abort ongoing migrations before returning.

However, in case of donor step-down right at the end of a refresh, it may happen that the refresh succeeds even though the abortion has failed: this wait for abortion never throws because the migration source manager doesn't invalidate the future in case of error. This means that the refresh spawned by stopMigration succeeds and the coordinator can proceed with the next phase before the abortion completes by locally deleting the range deletion document and flagging the range deletion task as ready on the recipient side.

This is problematic because:

When snapshotting range deletions a rename participant may end up copying a document flagged as pending right before the migration deletes it (in case of donor) or unflags it (in case of recipient)
The pending task would then be restored in the next phase

The result is that:

On the donor side: if the range deletion document gets deleted right between step 1 and 2, the document restored at 2 would forever be marked as "pending".
On the recipient side: if the range deletion happens to be executed right between step 1 and 2, the document restored at 2 would forever be marked as "pending".

At the time of writing, this ticket affects all versions supporting sharded rename, hence all versions >= v5.0.0

Assignee:: Pierlauro Sciarelli
Reporter:: Pierlauro Sciarelli
Participants:: Githook User, Pierlauro Sciarelli
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: May 25 2023 02:56:25 PM UTC
Updated:: Nov 24 2023 04:10:15 PM UTC
Resolved:: Oct 03 2023 07:51:56 PM UTC
Confidence Status Last Update:: 29/Sep/23 5:19 PM

Details

Description

Attachments

Forms

Activity

People

Dates