[SERVER-62176] Don't hold the Fetcher::_mutex while emplacing the completion promise Created: 17/Dec/21  Updated: 20/Dec/21  Resolved: 20/Dec/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Broadstone Assignee: Matt Broadstone
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-62185 Ensure TenantMigrationRecipentService... Closed
Operating System: ALL
Participants:
Linked BF Score: 131

 Description   

We must ensure that a Fetcher may be immediately destroyed after emplacing a value in the Fetcher::_completionPromise. This prevents deadlock since the Fetcher destructor calls shutdown which also acquires the Fetcher::_mutex.

Consider the following snippet:

{
  auto fetcher = makeFetcherSharedPointer(...);
  fetcher.onCompletion().thenRunOn(executorWhichRejectsWork).then([fetcher] {
    // this will never run
  });
}

An executor which rejects work (likely due to being shutdown) will never run the completion, and so the fetcher destructor will be run inline with the fulfillment of the future.



 Comments   
Comment by Matt Broadstone [ 20/Dec/21 ]

After further investigation, I don't believe this will fix the issue since the internal RetryRemoteCommandScheduler has a similar problem leading to a self-join. Today we need to make sure that references to a Fetcher live until after their completion promise become ready (if you are waiting on it), and await existing plans to develop a fully async replacement for Fetcher.

Here is a reproducing test case for future interested parties:

TEST_F(FetcherTest, CompletionFutureReleasesMutexBeforeBecomingReady) {
    auto fut = [&] {
        auto scopedFetcher = std::make_shared<Fetcher>(
            &getExecutor(), source, "scoped", findCmdObj, doNothingCallback);
        ASSERT_OK(scopedFetcher->schedule());
 
        auto rejectingExecutor = RejectingExecutor::make();
        return scopedFetcher->onCompletion().thenRunOn(rejectingExecutor).then([scopedFetcher] {
            MONGO_UNREACHABLE;
        });
    }();
 
    {
        // similar to `processNetworkResponse` but without assertions about `fetcher`
        const BSONObj doc = BSON("_id" << 1);
        auto response BSON("cursor" << BSON("id" << 0LL << "ns"
                                                 << "scoped.coll"
                                                 << "firstBatch" << BSON_ARRAY(doc))
                                    << "ok" << 1);
 
        executor::NetworkInterfaceMock::InNetworkGuard guard(getNet());
        getNet()->scheduleSuccessfulResponse(response);
        getNet()->runReadyNetworkOperations();
    }
 
    fut.wait();
}

Generated at Thu Feb 08 05:54:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.