[SERVER-77479] Sharded rename participants may incorrectly snapshot/restore pending range deletion documents Created: 25/May/23  Updated: 24/Nov/23  Resolved: 03/Oct/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 6.0.6, 6.3.1, 5.0.18, 7.0.0-rc2
Fix Version/s: 7.1.1, 7.2.0-rc0, 7.0.3, 6.0.12

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0, v6.0, v5.0
Sprint: Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02, Sharding EMEA 2023-10-16
Participants:
Linked BF Score: 5
Story Points: 3

 Description   

When stopping migrations on a sharded collection being renamed, the flow leads to a refresh on every shard in order for them to discover the stopMigrations flag and abort ongoing migrations before returning.

However, in case of donor step-down right at the end of a refresh, it may happen that the refresh succeeds even though the abortion has failed: this wait for abortion never throws because the migration source manager doesn't invalidate the future in case of error. This means that the refresh spawned by stopMigration succeeds and the coordinator can proceed with the next phase before the abortion completes by locally deleting the range deletion document and flagging the range deletion task as ready on the recipient side.

This is problematic because:

  1. When snapshotting range deletions a rename participant may end up copying a document flagged as pending right before the migration deletes it (in case of donor) or unflags it (in case of recipient)
  2. The pending task would then be restored in the next phase

The result is that:

  • On the donor side: if the range deletion document gets deleted right between step 1 and 2, the document restored at 2 would forever be marked as "pending".
  • On the recipient side: if the range deletion happens to be executed right between step 1 and 2, the document restored at 2 would forever be marked as "pending".

At the time of writing, this ticket affects all versions supporting sharded rename, hence all versions >= v5.0.0



 Comments   
Comment by Githook User [ 31/Oct/23 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-77479 Set MigrationSourceManager::_completion to an error when the migration commit/cleanup has not completed
Branch: v7.1
https://github.com/mongodb/mongo/commit/e1177471313141260f25a007fe1d9bfbd1f2222e

Comment by Githook User [ 06/Oct/23 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-77479 Set MigrationSourceManager::_completion to an error when the migration commit/cleanup has not completed
Branch: v7.0
https://github.com/mongodb/mongo/commit/9a9d362853c1518f2aadc54224269348ff43e73c

Comment by Githook User [ 06/Oct/23 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-77479 Set MigrationSourceManager::_completion to an error when the migration commit/cleanup has not completed
Branch: v6.0
https://github.com/mongodb/mongo/commit/7d96453dd8ea852dcc6b065be19476901276a5ab

Comment by Githook User [ 03/Oct/23 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-77479 Set MigrationSourceManager::_completion to an error when the migration commit/cleanup has not completed
Branch: master
https://github.com/mongodb/mongo/commit/e009e807eba6109e493dd493ee8263f15da0d03a

Generated at Thu Feb 08 06:35:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.