-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Query Execution
-
None
-
Query Execution
-
Fully Compatible
-
ALL
-
v8.0
-
QE 2025-02-03
-
0
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The destructor of the AsyncResultsMerger has the following invariant to make sure that the AsyncResultsMerger is only destroyed if the remote cursors have either been killed or fully consumed:
invariant(_lifecycleState == kKillComplete || _remotesExhausted(lk));
The AsyncResultsMerger can schedule async requests to fetch more data from remote cursors. These requests are executed eventually by an arbitrary thread from the task executor's thread pool. If any of these requests is still in flight while the AsyncResultsMerger's destructor is executing, this leads to the invariant being violated. We have seen this happen in BF-35922 and in BF-33891.
To be on the safe side and avoid the invariant failure, the destructor of an AsyncResultsMerger should only be executed once the AsyncResultsMerger has processed all outstanding requests, or if its kill(...) method has been called.
This is however not guaranteed, at least not in our unit tests.
Some of our unit tests create BlockingResultsMerger objects on the stack. Each BlockingResultsMerger contains an AsyncResultsMerger as a member.
Using it as follows in a unit test is not exception-safe and can trigger the invariant failure if an exception is thrown somewhere:
{ BlockingResultsMerger brm(...); // Throws an exception! brm.doSomething(); // Not reached because of previous exception. krm.kill(operationContext()); // End of the scope. // This will destroy the BlockingResultsMerger and its AsyncResultsMerger. // As the kill() command has not been executed, this will trigger the invariant failure // in the destructor of the AsyncResultsMerger. }
Forgetting to call `kill(...)` before shutting down the object can lead to subtle bugs which are hard to reproduce.
The root cause for BF-35922 is that it does not call `kill(...)` on the BlockingResultsMerger object, which then leads to a race between the AsyncResultsMerger processing its async remote response in a background thread of the task executor, and the main thread destroying the BlockingResultsMerger (which in turn destroys its AsyncResultsMerger instance).
Throwing the invariant failure in the destructor in case not all results have been processed is currently necessary to avoid undefined behavior.
This is because the spawned async requests currently do not ensure that the AsyncResultsMerger instance that spawned them is still alive when its callback function executes.
We should be able to improve on this by creating all AsyncResultsMerger instances as shared_ptrs, and capturing a copy of that shared_ptr when scheduling the async callback function.
That way we can ensure that the AsyncResultsMerger instance will still be alive when the callback function executes, and avoid undefined behavior.
Passing a copy of the shared_ptr also ensures that the AsyncResultsMerger's destructor is only executed if there are no async requests in flight.
- related to
-
SERVER-102422 Remove invariant in AsyncResultsMerger destructor
-
- Closed
-