[SERVER-62216] When abortTenantIndexBuilds failed to abort during a tenant migration, we should wait for the createIndex to finish before continuing the MTM Created: 21/Dec/21  Updated: 29/Oct/23  Resolved: 02/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.1.0
Fix Version/s: 5.3.0

Type: Bug Priority: Major - P3
Reporter: Mathis Bessa Assignee: Mathis Bessa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-62154 The tenant_migration.py script should... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Server Serverless 2021-12-27, Server Serverless 2022-01-10, Server Serverless 2022-01-24, Server Serverless 2022-02-07
Participants:
Linked BF Score: 15

 Description   

We found out that when the tenant_migration_donor_service calls to abort the index build, we end up not aborting the index build because of the current state of the createIndex which here was kCommitQuorumSatisfied

We should handle when IndexBuildsCoordinator::abortTenantIndexBuilds calls 
abortIndexBuildByBuildUUID and returns false, we should propagate that failure in order to then abort the tenant migration for that reason. We should also use the reason parameter in order to give more information in the log line.

It was decided instead we should wait for the indexBuilds to finish unregistering the index build. There could be a race condition where we would return committed but the index build hasn't unregistered yet and this could be an issue too.

The solution to this is to always wait for the unregisterIndexBuild to finish.
We should also take this opportunity to give more details in the logs to help diagnose this type of issues / failures.

In addition to this we will also handle that case to not be an error in the tenant_migration.py hook here : SERVER-62154



 Comments   
Comment by Githook User [ 22/Jan/22 ]

Author:

{'name': 'mathisbessamdb', 'email': 'mathis.bessa@mongodb.com', 'username': 'mathisbessamdb'}

Message: SERVER-62216 When abortTenantIndexBuilds failed to abort during a tenant migration
Branch: master
https://github.com/mongodb/mongo/commit/98f6025860618721a2fa414a0345caf4c4d5bd57

Comment by Suganthi Mani [ 31/Dec/21 ]

Just hit my brain, technically,  for the case where abortIndexBuildByBuildUUID() returns true for TryAbortResult::kAlreadyAborted(), it is guaranteed that the index build has completed aborting process i.e, an abortIndexBuild oplog entry has generated for that index build. Basically, abortIndexBuildByBuildUUID()
1) Acquire the collection lock in mode X.
2) tryAbort() (sets signal to the index build)
3) _completeAbort() (removes index entry from durable catalog & generates abortOplogEntry).

This means, we can't have 2 concurrent threads one (tenantMigration thread) executing tryAbort() and other(e.g., dropCollection thread) executing _completeAbort(). This means option#1 (Wait for this ReplIndexBuildState::sharedPromise to get fulfilled) is sufficient and it's ok to wait for that promise to be fulfilled only for abortIndexBuildByBuildUUID() false case. For more clarity to readers, I would also recommend to correct this comment to something, like,
"Returns true if the index build is aborted or already aborted."

UPDATE:
After looking into the code, I realized option#1 is not safe and can make the tenant migration thread to hang forever, waiting for ReplIndexBuildState::sharedPromise to get fulfilled. There is a case where the index build thread won't fulfill the ReplIndexBuildState::sharedPromise. For those cases, BrokenPromise error is set on calling SharedPromise destructor, that's called when ReplIndexBuildState destructor gets called. But the tenantMigration abort thread is holding a reference to ReplIndexBuildState which means the ReplIndexBuildState & sharedPromise destructor's won't be called, leading to tenant migration thread hang.

So, I think, it's safer to wait for the index build to get removed from the index build registry.

CC mathis.bessa

Comment by Suganthi Mani [ 31/Dec/21 ]

For Future reference: Following is the another scenario where we can hit the same issue as BF-23463,  IndexBuildsCoordinator::abortTenantIndexBuilds() calls abortIndexBuildByBuildUUID() to abort the index builds. abortIndexBuildByBuildUUID() returns
1) False if the index build does not exist or the index build is already in the process of committing (i.e, the index build has already received signal - IndexBuildAction::kCommitQuorumSatisfied).
2) True if the index build was aborted or the index build is already in the process of being aborted.

BF-23463 showcases the case #1 "where the index build is already in the process of committing". But we can hit the same BF issue for case #2 "the index build is already in the process of being aborted". Looking into the code, we can see ReplIndexBuildState::tryAbort() returns TryAbortResult::kAlreadyAborted() if the index build has already received signal "IndexBuildAction::kPrimaryAbort" (Before we drop the indexes/collection/databases, we try to abort the active index builds using signal "kPrimaryAbort" ) but trying to set "IndexBuildAction::kTenantMigrationAbort". However, for TryAbortResult::kAlreadyAborted() IndexBuildsCoordinator::abortTenantIndexBuilds() returns true. And for this case, it's not guaranteed that index build has finished completing aborting process i.e, has generated the "abortIndexBuild" oplog entry. This means there is a possibility that an "abortIndexBuild" can be generated after the tenant migration has started (i.e, after _startApplyingDonorOpTime), leading to the server crash.

In order to tackle both the problematic scenarios , our proposal is to wait for the index build to complete (specifically make sure the index build has finished generating the commit/abortIndexBuild oplog entry) irrespective of abortIndexBuildByBuildUUID() return values. There are currently 2 options:

1) Wait for this ReplIndexBuildState::sharedPromise to get fulfilled.

  • Rejected: This still doesn't guarantee that an "abortIndexBuild" oplog entry has already been generated for the index build, for cases where the index build has already received "kPrimaryAbort" signal.

2) Wait for the index build to get removed from the index build registry.

CC mathis.bessa

Generated at Thu Feb 08 05:54:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.