[SERVER-62340] Tenant Migration can lead to leakage of "TenantMigrationBlockerAsync" threads. Created: 04/Jan/22  Updated: 29/Oct/23  Resolved: 14/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Didier Nadeau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Server Serverless 2022-01-24
Participants:
Linked BF Score: 15

 Description   

While investigating the BF, it revealed that the tenant migration donor code can lead to leakage of "TenantMigrationBlockerAsync" threads. Consider the below scenario:

1) Donor starts migration for tenant foo.
2) Donor secondary on receiving the state doc with "kAbortingIndexBuilds" state, it creates an access blocker for tenant foo and add it to the blocker registry. Now, the shared reference count on the donor access blocker will be 1.
3) Donor secondary receives a find command for tenant foo.
4) Find request calls checkIfCanReadOrBlock() and asynchronously waits on the canRead promise to be fulfilled and captures the tenant's donor access blocker as shared pointer. Now, the shared reference count on the donor access blocker will be 2.
5) Donor primary durably commits the migration for tenant "foo" and marks the state document as garbage collectible.
6) Donor secondary on receiving the donor state doc with 'expireAt' set, it would fulfill the canRead promise and remove the donor access blocker for tenant foo from the blocker registry. Now, the shared reference count on the donor access blocker will reduce to 1
7) On canRead promise fulfillment, we run this continuation chain on
_asyncBlockingOperationsExecutor, backed up the thread pool TenantMigrationBlockerAsyncThreadPool. Once the work in the chain is completed, it will release all the captured resources. This will lead the shared reference count on the donor access blocker to decrement by 1, i.e, number of shared owners will now be 0.

This results in calling of TenantMigrationDonorAccessBlocker destructor, which in turn results in calling of _asyncBlockingOperationsExecutor's destructor (This thread pool executor is shared by all donor access blockers and is destroyed when no access blockers exist), that makes the executor to shutdown and waits for executor to join. But, the executor join() is blocked waiting for current "TenantMigrationBlockerAsync-X" thread to join and the current "TenantMigrationBlockerAsync-X" thread is waiting for executor _join() to complete, leading to self-deadlock and leakage of "TenantMigrationBlockerAsync" threads.

Note: The same problem exist on the recipient side as well.



 Comments   
Comment by Githook User [ 14/Jan/22 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-62340 Move executor to TenantAccessBlockerRegistry to prevent destruction issues
Branch: master
https://github.com/mongodb/mongo/commit/a22829787f370bd4d7e4fd767517dd616169638e

Comment by Esha Maharishi (Inactive) [ 12/Jan/22 ]

Sounds good. Just a note that I'm not sure if first bullet would be an issue in practice. I didn't get to mention this on Zoom, but there are existing places where we block user threads and unblock them all at once, such as on exiting the sharding migration critical section.

Comment by Didier Nadeau [ 12/Jan/22 ]

Following a discussion with esha.maharishi  and suganthi.mani , we decided to go ahead with Suganthi's idea to move the ownership of the ThreadPool to `TenantMigrationAccessBlockerRegistry` and make it a `shared_ptr`. As this means it will always exist, we will set the `minThread` to 0 so that, when there is no migration, no threads exist to remove impact on non-serverless instances.

We decided not to go ahead and integrate the `checkIfCanReadOrBlock`'s future into the caller's future chain for two reasons :

  • In Synchronous mode (one thread per client operation) there can be a very large number of threads that would wait on the future at the same time. In turns this could lead to an overload as all these threads would be unblocked and try to do some work at the same time. See comment here
  • Currently the future is interruptible using the opCtx (`wait(opCtx)`). There does not seem to be a clean and straightforward way to preserve the interruptibility if we integrate the future into the caller's chain (i.e. `future.thenRunOn(callerExecutor).then(...)`)

 

Comment by Suganthi Mani [ 11/Jan/22 ]

didier.nadeau To my understanding, there will be an active POS executor only on primary. But, we maintain access blocker + it's executor both on primary and secondaries. So, I think probably the idea#2 (Retrieve the access_blocker's executor and run a simple lambda on the POS's executor) won't work.

Comment by Didier Nadeau [ 11/Jan/22 ]

Some ideas :

  • Finish the continuation on a different executor that does nothing but capture the access_blocker. This ensures the destructor is not run on the access_blocker's executor, preventing the deadlock.
    • One possible issue problem with this would be that a very large amount of continuations are unblocked at the same time (multiple operations waiting on the future) leading to an executor overload GitHub comment
    • This could be the main POS cleanup executor ?
  • Retrieve the access_blocker's executor and run a simple lambda on the POS's executor. The lambda would be :

return AsyncTry([accessBlockerExecutorSharedPtrAnchor] () {
        })
    .until([accessBlockerWeakPtr]() {
        return !!accessBlockerWeakPtr.lock();
    })
    .withDelayBetweenIterations(miliseconds(100))
    .on(**posExecutor

    • Possible issue : handling shutdown of the POS executor ?
Comment by Esha Maharishi (Inactive) [ 10/Jan/22 ]

didier.nadeau and matt.broadstone to triage along with Implement Split work tomorrow.

Generated at Thu Feb 08 05:54:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.