-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
Service Arch 2022-2-21, Service Arch 2022-03-07, Service Arch 2022-03-21, Service Arch 2022-04-04, Service Arch 2022-04-18, Service Arch 2022-05-02, Service Arch 2022-05-16, Service Arch 2022-05-30, Service Arch 2022-06-13
-
4
On sharded clusters, client requests and system operations need to be routed from mongos to mongod, and occasionally from mongod to mongod. These requests need to acquire an outbound connection from the source host to the target host. This process is asynchronous, and we don't know how long it takes for requests to acquire a connection. As a result, it's difficult for us to determine when egress connection pooling might be a bottleneck for servicing requests, or what the exact impact on request latency is when the connection pool is under strain.
We should add metrics that answer the question: 'how much time does a request from (serverA) to (serverB) spend waiting to acquire a connection?' It might be best to use a histogram-based approach, in the style of SERVER-59858, where we maintain a histogram of wait-times for the last N connections/over the last X minutes. We could also 'rotate' the histograms, where we always keep one for the last (say) minute, and then have an 'aggregated' one of the last N minutes. It also would be ideal to collect the histograms on a per-targeted-host basis.
Generally, egress connections are acquired by requests here for the NITL-based task executors. ScopedDBConnection, defined here, is also sometimes used to acquire connections, namely in the old "scanning" RSM, dbclient_rs, and a 1 or 2 sharded commands, but it may not be worth it to collect metrics for this outdated component.
- depends on
-
SERVER-66739 Fix wasNeverUsed in conn_pool_stats.js
- Closed
- is depended on by
-
TOOLS-3132 Investigate changes in SERVER-63261: Add metrics for wait time for requests to acquire egress connections
- Closed