[SERVER-68680] opportunistic{Read,Write} can deadlock when recursively invoked inline Created: 09/Aug/22  Updated: 29/Oct/23  Resolved: 12/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc3, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-64191 Ensure `Session::cancelAsyncOperation... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1
Steps To Reproduce:

Apply this diff and run transport_layer_asio_test

diff --git a/src/mongo/transport/session_asio.cpp b/src/mongo/transport/session_asio.cpp
index 7d5daa4d59f..566e94ff219 100644
--- a/src/mongo/transport/session_asio.cpp
+++ b/src/mongo/transport/session_asio.cpp
@@ -45,6 +45,7 @@ namespace mongo::transport {
 MONGO_FAIL_POINT_DEFINE(transportLayerASIOshortOpportunisticReadWrite);
 MONGO_FAIL_POINT_DEFINE(transportLayerASIOSessionPauseBeforeSetSocketOption);
 MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeOpportunisticRead);
+MONGO_FAIL_POINT_DEFINE(transportLayerASIOBlockBeforeAddSession);
 
 namespace {
 
@@ -574,6 +575,7 @@ Future<void> TransportLayerASIO::ASIOSession::opportunisticRead(
             return makeCanceledStatus();
         if (auto networkingBaton = baton ? baton->networking() : nullptr;
             networkingBaton && networkingBaton->canWait()) {
+            transportLayerASIOBlockBeforeAddSession.pauseWhileSet();
             return networkingBaton->addSession(*this, NetworkingBaton::Type::In)
                 .onError([](Status error) {
                     if (ErrorCodes::isShutdownError(error)) {
diff --git a/src/mongo/transport/transport_layer_asio_test.cpp b/src/mongo/transport/transport_layer_asio_test.cpp
index 05e4bf52bdd..f1580e41690 100644
--- a/src/mongo/transport/transport_layer_asio_test.cpp
+++ b/src/mongo/transport/transport_layer_asio_test.cpp
@@ -1071,6 +1071,24 @@ TEST_F(BatonASIOLinuxTest, CancelAsyncOperationsInterruptsOngoingOperations) {
                        ErrorCodes::CallbackCanceled);
 }
 
+TEST_F(BatonASIOLinuxTest, Repro) {
+    auto opCtx = client().makeOperationContext();
+    auto baton = opCtx->getBaton();
+    MilestoneThread thread([&](Notification<void>& isReady) {
+        FailPointEnableBlock fp("transportLayerASIOBlockBeforeAddSession");
+        isReady.set();
+        waitForTimesEntered(fp, 1);
+        // client has acquired asyncOpMutex for the session 
+        // and is waiting to add session. 
+        // detach the baton from the opCtx to ensure addSession returns ready future
+        opCtx.reset();
+    });
+
+    auto f = client().session()->asyncSourceMessage(baton);
+    // expect this to hang due to recursively attempting to acquire asyncOpMutex
+    // after addSession returns a ready future
+}
+
 #endif  // __linux__
 
 }  // namespace
 

Sprint: Service Arch 2022-09-05, Service Arch 2022-09-19
Participants:
Linked BF Score: 12

 Description   

On ASIOSessions where a networking baton is available and async networking is being used, opportunistic{Read,Write} attempt to add the ASIOSession to the baton's pollset via Baton::addSession when the socket operation would block. addSession returns a future that resolves with success when the socket is available for work. opportunsitc{Read,Write} functions therefore chain a continuation to the future returned by addSession that invokes themselves again recursively to work with the available socket.

Normally, at the time the continuation is chained, the socket is not ready for work, so the continuation will not be run inline as the future returned by addSession is not ready. However, if the future is ready when the continuation is chained, as might be the case if the baton has been detached, the functions will be invoked inline. Because the functions acquire _asyncOpMutex before chaining this continuation, the recursive inline invocation will deadlock attempting to re-acquire this mutex.

This deadlock may  occur on a networking reactor thread as these often call asyncSourceMessage/asyncSinkMessage. Additionally, note that this deadlock will hang any other threads that subsequently try and access the _asyncOpMutex, such as client threads (via batons) attempting to cancel outstanding networking operations.



 Comments   
Comment by Githook User [ 15/Sep/22 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-68680 Avoid deadlocks when recursively invoking `opportunisticRead/Write`

(cherry picked from commit e884fbf3ac392ca01d080f68beb96328c7ae4c60)
Branch: v6.1
https://github.com/mongodb/mongo/commit/8f1e14b4361f3b1478f6bd618af8f9a82a8de3b8

Comment by Githook User [ 12/Sep/22 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-68680 Avoid deadlocks when recursively invoking `opportunisticRead/Write`
Branch: master
https://github.com/mongodb/mongo/commit/e884fbf3ac392ca01d080f68beb96328c7ae4c60

Generated at Thu Feb 08 06:11:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.