Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68680

opportunistic{Read,Write} can deadlock when recursively invoked inline

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.1.0-rc3, 6.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v6.1
    • Hide

      Apply this diff and run transport_layer_asio_test

      diff --git a/src/mongo/transport/session_asio.cpp b/src/mongo/transport/session_asio.cpp
      index 7d5daa4d59f..566e94ff219 100644
      --- a/src/mongo/transport/session_asio.cpp
      +++ b/src/mongo/transport/session_asio.cpp
      @@ -45,6 +45,7 @@ namespace mongo::transport {
       MONGO_FAIL_POINT_DEFINE(transportLayerASIOshortOpportunisticReadWrite);
       MONGO_FAIL_POINT_DEFINE(transportLayerASIOSessionPauseBeforeSetSocketOption);
       MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeOpportunisticRead);
      +MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeAddSession);
       
       namespace {
       
      @@ -574,6 +575,7 @@ Future<void> TransportLayerASIO::ASIOSession::opportunisticRead(
                   return makeCanceledStatus();
               if (auto networkingBaton = baton ? baton->networking() : nullptr;
                   networkingBaton && networkingBaton->canWait()) {
      +            transportLayerASIOBlockBeforeAddSession.pauseWhileSet();
                   return networkingBaton->addSession(*this, NetworkingBaton::Type::In)
                       .onError([](Status error) {
                           if (ErrorCodes::isShutdownError(error)) {
      diff --git a/src/mongo/transport/transport_layer_asio_test.cpp b/src/mongo/transport/transport_layer_asio_test.cpp
      index 05e4bf52bdd..f1580e41690 100644
      --- a/src/mongo/transport/transport_layer_asio_test.cpp
      +++ b/src/mongo/transport/transport_layer_asio_test.cpp
      @@ -1071,6 +1071,24 @@ TEST_F(BatonASIOLinuxTest, CancelAsyncOperationsInterruptsOngoingOperations) {
                              ErrorCodes::CallbackCanceled);
       }
       
      +TEST_F(BatonASIOLinuxTest, Repro) {
      +    auto opCtx = client().makeOperationContext();
      +    auto baton = opCtx->getBaton();
      +    MilestoneThread thread([&](Notification<void>& isReady) {
      +        FailPointEnableBlock fp("transportLayerASIOBlockBeforeAddSession");
      +        isReady.set();
      +        waitForTimesEntered(fp, 1);
      +        // client has acquired asyncOpMutex for the session 
      +        // and is waiting to add session. 
      +        // detach the baton from the opCtx to ensure addSession returns ready future
      +        opCtx.reset();
      +    });
      +
      +    auto f = client().session()->asyncSourceMessage(baton);
      +    // expect this to hang due to recursively attempting to acquire asyncOpMutex
      +    // after addSession returns a ready future
      +}
      +
       #endif  // __linux__
       
       }  // namespace
       
      Show
      Apply this diff and run transport_layer_asio_test diff --git a/src/mongo/transport/session_asio.cpp b/src/mongo/transport/session_asio.cpp index 7d5daa4d59f..566e94ff219 100644 --- a/src/mongo/transport/session_asio.cpp +++ b/src/mongo/transport/session_asio.cpp @@ -45,6 +45,7 @@ namespace mongo::transport {  MONGO_FAIL_POINT_DEFINE(transportLayerASIOshortOpportunisticReadWrite);  MONGO_FAIL_POINT_DEFINE(transportLayerASIOSessionPauseBeforeSetSocketOption);  MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeOpportunisticRead); +MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeAddSession);    namespace {   @@ -574,6 +575,7 @@ Future<void> TransportLayerASIO::ASIOSession::opportunisticRead(               return makeCanceledStatus();           if (auto networkingBaton = baton ? baton->networking() : nullptr;              networkingBaton && networkingBaton->canWait()) { +            transportLayerASIOBlockBeforeAddSession.pauseWhileSet();               return networkingBaton->addSession(* this , NetworkingBaton::Type::In)                  .onError([](Status error) {                       if (ErrorCodes::isShutdownError(error)) { diff --git a/src/mongo/transport/transport_layer_asio_test.cpp b/src/mongo/transport/transport_layer_asio_test.cpp index 05e4bf52bdd..f1580e41690 100644 --- a/src/mongo/transport/transport_layer_asio_test.cpp +++ b/src/mongo/transport/transport_layer_asio_test.cpp @@ -1071,6 +1071,24 @@ TEST_F(BatonASIOLinuxTest, CancelAsyncOperationsInterruptsOngoingOperations) {                         ErrorCodes::CallbackCanceled);  }   +TEST_F(BatonASIOLinuxTest, Repro) { +    auto opCtx = client().makeOperationContext(); +    auto baton = opCtx->getBaton(); +    MilestoneThread thread([&](Notification<void>& isReady) { +        FailPointEnableBlock fp( "transportLayerASIOBlockBeforeAddSession" ); +        isReady.set(); +        waitForTimesEntered(fp, 1); +         // client has acquired asyncOpMutex for the session  +         // and is waiting to add session.  +         // detach the baton from the opCtx to ensure addSession returns ready future +        opCtx.reset(); +    }); + +    auto f = client().session()->asyncSourceMessage(baton); +     // expect this to hang due to recursively attempting to acquire asyncOpMutex +     // after addSession returns a ready future +} +  #endif   // __linux__    }   // namespace
    • Service Arch 2022-09-05, Service Arch 2022-09-19
    • 12

      On ASIOSessions where a networking baton is available and async networking is being used, opportunistic{Read,Write} attempt to add the ASIOSession to the baton's pollset via Baton::addSession when the socket operation would block. addSession returns a future that resolves with success when the socket is available for work. opportunsitc{Read,Write} functions therefore chain a continuation to the future returned by addSession that invokes themselves again recursively to work with the available socket.

      Normally, at the time the continuation is chained, the socket is not ready for work, so the continuation will not be run inline as the future returned by addSession is not ready. However, if the future is ready when the continuation is chained, as might be the case if the baton has been detached, the functions will be invoked inline. Because the functions acquire _asyncOpMutex before chaining this continuation, the recursive inline invocation will deadlock attempting to re-acquire this mutex.

      This deadlock may  occur on a networking reactor thread as these often call asyncSourceMessage/asyncSinkMessage. Additionally, note that this deadlock will hang any other threads that subsequently try and access the _asyncOpMutex, such as client threads (via batons) attempting to cancel outstanding networking operations.

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: