[SERVER-72939] Fix data-race in `SessionWorkflow` unit-tests Created: 17/Jan/23  Updated: 29/Oct/23  Resolved: 26/Jan/23

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
is caused by SERVER-68875 Reflow the SessionWorkflow loop Closed
Related
related to SERVER-68723 align SessionWorkflowTest with mockin... Closed
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service Arch 2023-02-06
Participants:
Linked BF Score: 58

 Description   

The unit-tests for SessionWorkflow use a thread-pool to handle requests, implying that the command processing part runs on a separate thread than the one that initiates it:

    Future<DbResponse> _handleRequest(OperationContext* opCtx, const Message& request) noexcept {
        auto [p, f] = makePromiseFuture<DbResponse>();
        ExecutorFuture<void>(_threadPool)
            .then([this, opCtx, &request, p = std::move(p)]() mutable {
                p.setWith([&] { return _processRequest(opCtx, request); });
            })
            .getAsync([](auto&&) {});
        return std::move(f);
    }

Since we are using ExecutorFuture here, the rest of the continuation shown below may run on a thread that doesn't have the Client attached:

...
    auto fr = std::make_shared<Frame>(shared_from_this());
    return _getNextWork()
        .then([&, fr](auto work) {
            fr->metrics.received();
            invariant(!_work);
            _work = std::move(work);
            return _dispatchWork();
        })
        .then([&, fr](auto rsp) {
            _acceptResponse(std::move(rsp));
            fr->metrics.processed();
            _sendResponse();
            fr->metrics.sent(*session());
            _yieldPointReached();
            fr->metrics.yielded();
        });
...

As part of running _acceptResponse, we detach and kill the opCtx that must happen on a thread than has the Client attached (through ClientStrand), but that may not be the case in unit-tests (for both dedicated and shared threading models).



 Comments   
Comment by Billy Donahue [ 26/Jan/23 ]

fixed by SERVER-68723

Comment by Billy Donahue [ 23/Jan/23 ]

When SERVER-68875 removed the ClientStrand usage, I incorrectly thought it was redundant with the SessionWorkflow changes I was making. The SessionWorkflow was changed so that all scheduled tasks are wrapped by the _captureContext wrapper which enforces ClientStrand attachment. But yeah if the test task is going to use an ExecutorFuture to complete the Future<DbResponse> it's acting like a badly-behaved command.

We were thinking that this BF would be solved by SERVER-68723, and that's true.
But that code will right now just have the handleRequest future filled on the requesting thread.

If it's important to test handleRequest using a threadPool I'll restore that functionality and do it with a ClientStrand.
But we'll solve this with SERVER-68723 one way or another.

Generated at Thu Feb 08 06:23:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.