[SERVER-49435] uassert in NetworkInterfaceTL::setTimer can cause server to crash if connection future not immediately ready Created: 10/Jul/20  Updated: 29/Oct/23  Resolved: 23/Jun/21

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: 5.0.0-rc4, 4.4.9, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Janna Golden Assignee: George Wangensteen
Resolution: Fixed Votes: 0
Labels: post-rc0, servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-57893 Make rsm_horizon_change.js resilient ... Closed
Related
is related to SERVER-49434 Mark all 'getAsync' calls as noexcept... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Service arch 2020-11-30, Service Arch 2021-05-17, Service Arch 2021-05-31, Service Arch 2021-06-14, Service Arch 2021-06-28
Participants:
Case:
Linked BF Score: 37
Story Points: 5

 Description   

We can throw NetworkInterfaceExceededTimeLimit in NetworkInterfaceTL::CommandStateBase::setTimer() if the connection future is not immediately ready This is a problem in both the exhaust and non-exhaust paths - if the connection future isn't immediately ready we call 'trySend()' in a getAsync continuation call that runs on the reactor (here in the non-exhaust case and here in the exhaust case).

I had previously tried to catch this error in the exhaust case in SERVER-48493, but after the BF reoccured realized I hadn't fully diagnosed the issue and this fix actually does nothing. I know we decided to throw here to save us from some unnecessary work, but I'm not sure if we thought about the fact that we could crash in some cases. Since this affects both the exhaust and non-exhaust paths, I'll leave the broader solution up to service arch to decide what to do here. As a part of this ticket, it would be nice to essentially revert my changes that I made as a part of SERVER-48493 because they don't actually do anything.

 

Acceptance criteria:

Write a repro. 
Handle the NetworkInterfaceExceededTimeLimit error gracefully without letting the server crash.



 Comments   
Comment by Githook User [ 05/Aug/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-49435 Make NetworkInterfaceTL::ExhaustCommandState::sendRequest handle exceptions

(cherry picked from commit 32d4d90cb12b46a57101b5de4e9ba08b5ab50c9e)
Branch: v4.4
https://github.com/mongodb/mongo/commit/d9f44b62d17205d90ee9d426044c155951fabb11

Comment by Githook User [ 23/Jun/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-49435 Make NetworkInterfaceTL::ExhaustCommandState::sendRequest handle exceptions

(cherry picked from commit 32d4d90cb12b46a57101b5de4e9ba08b5ab50c9e)
Branch: v5.0
https://github.com/mongodb/mongo/commit/e0a85e5f9942962e67633170fe64d985c6a2b5b9

Comment by Githook User [ 23/Jun/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-49435 Make NetworkInterfaceTL::ExhaustCommandState::sendRequest handle exceptions
Branch: master
https://github.com/mongodb/mongo/commit/32d4d90cb12b46a57101b5de4e9ba08b5ab50c9e

Comment by George Wangensteen [ 13/May/21 ]

Revert of SERVER-48493: https://github.com/mongodb/mongo/commit/6a20534d96e97958db915896634b2dea58a81328

Comment by Amirsaman Memaripour [ 21/Apr/21 ]

I did some investigation and reproduced this for the exhaust path. For non-exhaust requests, however, the process of creating the future-continuation is capable of handling exceptions (see below), so this is not an issue for non-exhaust commands.

Future<RemoteCommandResponse> NetworkInterfaceTL::CommandState::sendRequest(
    std::shared_ptr<RequestState> requestState) {
    return makeReadyFutureWith([this, requestState] {
               setTimer();
               return RequestState::getClient(requestState->conn)
                   ->runCommandRequest(*requestState->request, baton);
           })
        .then([this, requestState](RemoteCommandResponse response) {
            doMetadataHook(RemoteCommandOnAnyResponse(requestState->host, response));
            return response;
        });
}

Generated at Thu Feb 08 05:19:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.