[SERVER-57784] TryUntilLoop Does Not Synchronize Destructor and Promise resolution Created: 17/Jun/21  Updated: 29/Oct/23  Resolved: 15/Oct/21

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc2

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
is caused by SERVER-54408 Rewrite AsyncTry-until to not use fut... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Steps To Reproduce:

Known Failing Commit - 8ef4580520429754f85003cdb9d126aede402dbd

Replicating Code Diff

index 0d5e45870e..1da2381fe8 100644
--- a/src/mongo/util/future_util.h
+++ b/src/mongo/util/future_util.h
@@ -328,6 +328,7 @@ private:
                                 resultPromise.setError(asyncTryCanceledStatus());
                             } else if (shouldStopIteration(swResult)) {
                                 resultPromise.setFrom(std::move(swResult));
+                                sleepsecs(5);
                             } else {
                                 runImpl(std::move(resultPromise));
                             }

And run:

build/install/bin/db_s_shard_server_test --filter=NothingToIterate

Sprint: Service Arch 2021-10-18, Service Arch 2021-11-01
Participants:
Linked BF Score: 144
Story Points: 2

 Description   

The current implementation of TryUntilLoop will resolve the promise and make the future ready before the function exits and hence before the destructor runs.

This means that the member variables (executor, body, etc) are destroyed at an arbitrary time after the future is ready.

Possible Solution:

diff --git a/src/mongo/util/future_util.h b/src/mongo/util/future_util.h
index 0d5e45870e..ccd57481fe 100644
--- a/src/mongo/util/future_util.h
+++ b/src/mongo/util/future_util.h
@@ -298,7 +298,7 @@ private:
             auto [promise, future] = makePromiseFuture<ReturnType>();
 
             // Kick off the asynchronous loop.
-            runImpl(std::move(promise));
+            runImpl(this->shared_from_this(), std::move(promise));
 
             return std::move(future).thenRunOn(executor);
         }
@@ -309,30 +309,36 @@ private:
          * reschedule another iteration of the loop.
          */
         template <typename ReturnType>
-        void runImpl(Promise<ReturnType> resultPromise) {
-            executor->schedule(
-                [this, self = this->shared_from_this(), resultPromise = std::move(resultPromise)](
-                    Status scheduleStatus) mutable {
-                    if (!scheduleStatus.isOK()) {
-                        resultPromise.setError(std::move(scheduleStatus));
-                        return;
-                    }
+        void runImpl(std::shared_ptr<TryUntilLoop> self, Promise<ReturnType> resultPromise) {
+            invariant(self.get() == this);
+            executor->schedule([this,
+                                self = std::move(self),
+                                resultPromise =
+                                    std::move(resultPromise)](Status scheduleStatus) mutable {
+                if (!scheduleStatus.isOK()) {
+                    resultPromise.setError(std::move(scheduleStatus));
+                    return;
+                }
 
-                    // Convert the result of the loop body into an ExecutorFuture, even if the
-                    // loop body is not Future-returning. This isn't strictly necessary but it
-                    // makes implementation easier.
-                    makeExecutorFutureWith(executor, executeLoopBody)
-                        .getAsync([this, self, resultPromise = std::move(resultPromise)](
-                                      StatusOrStatusWith<ReturnType>&& swResult) mutable {
+                // Convert the result of the loop body into an ExecutorFuture, even if the
+                // loop body is not Future-returning. This isn't strictly necessary but it
+                // makes implementation easier.
+                makeExecutorFutureWith(executor, executeLoopBody)
+                    .getAsync(
+                        [this, self = std::move(self), resultPromise = std::move(resultPromise)](
+                            StatusOrStatusWith<ReturnType>&& swResult) mutable {
                             if (cancelToken.isCanceled()) {
                                 resultPromise.setError(asyncTryCanceledStatus());
                             } else if (shouldStopIteration(swResult)) {
+                                self.reset();
                                 resultPromise.setFrom(std::move(swResult));
                             } else {
-                                runImpl(std::move(resultPromise));
+                                runImpl(std::move(self), std::move(resultPromise));
+                                sleepsecs(5);
                             }
                         });
-                });
+            });
         }
 
         ExecutorPtr executor;

Acceptance criteria: Reinvestigate the cause of SERVER-53434 and determine whether there's a fix that relies on joining the executor in the test fixture. If not, move this ticket back to Needs Scheduling for a wider discussion with the team.



 Comments   
Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-57784 Shutdown and join executors before returning from ReshardingOplogApplier tests

(cherry picked from commit 2bf37b1ac522b39e186388ea382a5a14c0636da6)
Branch: v5.0
https://github.com/mongodb/mongo/commit/9808adf67152b46f33f14bc5fa5ff122fdbb23cd

Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-57784 Shutdown and join executors before returning from ReshardingOplogApplier tests

(cherry picked from commit 2bf37b1ac522b39e186388ea382a5a14c0636da6)
Branch: v5.1
https://github.com/mongodb/mongo/commit/e80660e3fa6b48dcc35414957bc65e373859cb8b

Comment by Githook User [ 15/Oct/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-57784 Shutdown and join executors before returning from ReshardingOplogApplier tests
Branch: master
https://github.com/mongodb/mongo/commit/2bf37b1ac522b39e186388ea382a5a14c0636da6

Comment by Max Hirschhorn [ 29/Jun/21 ]

matthew.saltz, joining the task executor at the end of the test case doesn't provide strong exception safety. See SERVER-53434 as an example for how this can go wrong. SERVER-53538 has bitten me several times during the resharding project that I'd be disappointed for AsyncTry to continue this pitfall with using ExecutorFutures.

Comment by Matthew Saltz (Inactive) [ 29/Jun/21 ]

We discussed as a team and we think this is a test bug. It can be fixed by joining the executor at the end of the test. We're closing this ticket with the expectation that a new one will be opened for that BF.

Generated at Thu Feb 08 05:42:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.