Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.1.8
Affects Version/s: 3.2.20, 3.4.15, 3.6.4
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:
1. Set socketTimeout to 2s
2. Set $maxTimeMS to 1s for all queries
3. Throw lots of queries at cluster
4. rs.stepDown()
Sprint:
Service Arch 2019-01-14, Service Arch 2019-01-28
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

For clients of mongos that have tight deadlines, such as those that expect all queries to take less than 1s and who have maxTimeMS and socketTimeout set appropriately (1s and 2s respectively in our testing), a failover will force all connections from the client bound for the shard in transition to close and be reestablished. This can be problematic for environments with lots of connections (in addition to high throughput) as establishing connections can be expensive (e.g., thread create/destroy per connection).

It should be noted that while setting socketTimeout to be greater than the failover period would allow existing connections to persist rather than time out, this is also not a solution. Instead this either causes the app to excessively queue operations on its side waiting for existing connections to free up or open new connections in the interim to service those operations, again consuming excessive resources on mongos's side and inhibiting the timely feedback required by the application.

Prior to 3.2, this was not an issue as mongos would immediately pass back ReplicaSetMonitor no master found for set errors to the client, allowing it to decide how to handle retries while reusing existing connections. Since 3.2, however, the client connection to mongos will hang while trying to find an acceptable replica set member until its configured timeout (20s in versions >= 3.4 and 11s in version 3.2), or until an acceptable member becomes available, with no way to control that timeout.
Digging through the code in version 3.2, we can see that this issue is understood by the developers and that there is an intention to address it (see src/mongo/client/remote_command_targeter.cpp):

// This value is used if the operation doesn't have a user-specified max wait time. It should be
// closer to (preferably higher than) the replication electionTimeoutMillis in order to ensure that
// lack of primary due to replication election does not cause findHost failures.
const Seconds kDefaultFindHostMaxWaitTime(11);

...

Milliseconds RemoteCommandTargeter::selectFindHostMaxWaitTime(OperationContext* txn) {
    // TODO: Get remaining max time from 'txn'.
    Milliseconds remainingMaxTime(0);
    if (remainingMaxTime > Milliseconds::zero()) {
        return std::min(remainingMaxTime - kFindHostTimeoutPad,
                        Milliseconds(kDefaultFindHostMaxWaitTime));
    }

    return kDefaultFindHostMaxWaitTime;
}

Here we see the default time to wait for an acceptable server is 11s, that the intention is to allow the client to influence this time (e.g., This value is used if the operation doesn't have a user-specified max wait time.) presumably using $maxTimeMS, and that this is to be implemented (e.g., TODO: Get remaining max time from 'txn'). We also see this acknowledged in other parts of the code in 3.2 (see src/mongo/s/query/async_results_merger.cpp):

    // TODO: Pass down an OperationContext* to use here.
    auto findHostStatus = shard->getTargeter()->findHost(
        readPref, RemoteCommandTargeter::selectFindHostMaxWaitTime(nullptr));

This code gets executed via the following code path:

mongo/client/replica_set_monitor.cpp:520 – ReplicaSetMonitor::Refresher::getNextStep
mongo/client/replica_set_monitor.cpp:815 – ReplicaSetMonitor::Refresher::_refreshUntilMatches
mongo/client/replica_set_monitor.h:274 – ReplicaSetMonitor::Refresher::refreshUntilMatches
mongo/client/replica_set_monitor.cpp:317 – ReplicaSetMonitor::getHostOrRefresh
- 500ms backoff here
mongo/client/remote_command_targeter_rs.cpp:61 – RemoteCommandTargeterRS::findHost
mongo/s/query/async_results_merger.cpp:652 – AsyncResultsMerger::RemoteCursorData::resolveShardIdToHostAndPort
- RemoteCommandTargeter::selectFindHostMaxWaitTime called to retrieve 11s max wait time here
mongo/s/query/async_results_merger.cpp:256 – AsyncResultsMerger::askForNextBatch_inlock
mongo/s/query/async_results_merger.cpp:315 – AsyncResultsMerger::nextEvent
mongo/s/query/router_stage_merge.cpp:43 – RouterStageMerge::next
mongo/s/query/cluster_client_cursor_impl.cpp:75 – ClusterClientCursorImpl::next
mongo/s/query/cluster_find.cpp:196 – runQueryWithoutRetrying
mongo/s/query/cluster_find.cpp:348 – ClusterFind::runQuery

In versions >= 3.4, RemoteCommandTargeter::selectFindHostMaxWaitTime disappears, but the problem remains and is instead hard-coded to 20s in various places (see src/mongo/s/query/async_results_merger.cpp):

    // TODO: Pass down an OperationContext* to use here.
    auto findHostStatus = shard->getTargeter()->findHostWithMaxWait(readPref, Seconds{20});

We see further evidence of the intention to fix this issue in src/mongo/client/remote_command_targeter_rs.cpp:

        // Enforce a 20-second ceiling on the time spent looking for a host. This conforms with the
        // behavior used throughout mongos prior to version 3.4, but is not fundamentally desirable.
        // See comment in remote_command_targeter.h for details.
        if (clock->now() - startDate > Seconds{20}) {
            return host;
        }

And in src/mongo/client/remote_command_targeter.h, as mentioned in the previous comment:

    /**
     * Finds a host matching readPref blocking up to 20 seconds or until the given operation is
     * interrupted or its deadline expires.
     *
     * TODO(schwerin): Once operation max-time behavior is more uniformly integrated into sharding,
     * remove the 20-second ceiling on wait time.
     */
    virtual StatusWith<HostAndPort> findHost(OperationContext* txn,
                                             const ReadPreferenceSetting& readPref) = 0;

The code path is only slightly different in 3.4:

mongo/client/replica_set_monitor.cpp:481 – ReplicaSetMonitor::Refresher::getNextStep
mongo/client/replica_set_monitor.cpp:797 – ReplicaSetMonitor::Refresher::_refreshUntilMatches
mongo/client/replica_set_monitor.h:294 – ReplicaSetMonitor::Refresher::refreshUntilMatches
mongo/client/replica_set_monitor.cpp:266 – ReplicaSetMonitor::getHostOrRefresh
- 500ms backoff here
mongo/client/remote_command_targeter_rs.cpp:63 – RemoteCommandTargeterRS::findHostWithMaxWait
mongo/s/query/async_results_merger.cpp:692 – AsyncResultsMerger::RemoteCursorData::resolveShardIdToHostAndPort
- hard coded to wait for 20s here
mongo/s/query/async_results_merger.cpp:261 – AsyncResultsMerger::askForNextBatch_inlock
mongo/s/query/async_results_merger.cpp:324 – AsyncResultsMerger::nextEvent
mongo/s/query/router_stage_merge.cpp:43 – RouterStageMerge::next
mongo/s/query/cluster_client_cursor_impl.cpp:75 – ClusterClientCursorImpl::next
mongo/s/query/cluster_find.cpp:153 – runQueryWithoutRetrying
mongo/s/query/cluster_find.cpp:305 – ClusterFind::runQuery

As noted above, this appears to be the same behavior in 3.6, although we have not tested this behavior with 3.6 yet.

Given the developer comments and current undesirable behavior, we would like to see this issue addressed and/or understand what roadblocks are currently preventing implementation of a solution.

Assignee:: Mathias Stearn
Reporter:: Gregory Banks
Participants:: Githook User, Gregory Banks, Mathias Stearn, Ramon Fernandez Marina
Votes:: 1 Vote for this issue
Watchers:: 16 Start watching this issue

Created:: May 21 2018 07:10:41 PM UTC
Updated:: Oct 29 2023 10:31:35 PM UTC
Resolved:: Jan 24 2019 12:04:19 AM UTC
Confidence Status Last Update:: 27/Dec/18 6:32 PM

Details

Description

Attachments

Activity

People

Dates