[SERVER-63261] Add metrics for wait time for requests to acquire egress connections Created: 03/Feb/22  Updated: 29/Oct/23  Resolved: 28/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: George Wangensteen Assignee: Vojislav Stojkovic
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-66739 Fix wasNeverUsed in conn_pool_stats.js Closed
is depended on by TOOLS-3132 Investigate changes in SERVER-63261: ... Closed
Documented
is documented by DOCS-15371 Investigate changes in SERVER-63261: ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Service Arch 2022-2-21, Service Arch 2022-03-07, Service Arch 2022-03-21, Service Arch 2022-04-04, Service Arch 2022-04-18, Service Arch 2022-05-02, Service Arch 2022-05-16, Service Arch 2022-05-30, Service Arch 2022-06-13
Participants:
Story Points: 4

 Description   

On sharded clusters, client requests and system operations need to be routed from mongos to mongod, and occasionally from mongod to mongod. These requests need to acquire an outbound connection from the source host to the target host. This process is asynchronous, and we don't know how long it takes for requests to acquire a connection. As a result, it's difficult for us to determine when egress connection pooling might be a bottleneck for servicing requests, or what the exact impact on request latency is when the connection pool is under strain. 

 

We should add metrics that answer the question: 'how much time does a request from (serverA) to (serverB) spend waiting to acquire a connection?' It might be best to use a histogram-based approach, in the style of SERVER-59858, where we maintain a histogram of wait-times for the last N connections/over the last X minutes. We could also 'rotate' the histograms, where we always keep one for the last (say) minute, and then have an 'aggregated' one of the last N minutes. It also would be ideal to collect the histograms on a per-targeted-host basis. 

Generally, egress connections are acquired by requests  here for the NITL-based task executors. ScopedDBConnection, defined here, is also sometimes used to acquire connections, namely in the old "scanning" RSM, dbclient_rs, and a 1 or 2 sharded commands, but it may not be worth it to collect metrics for this outdated component.  



 Comments   
Comment by Githook User [ 28/May/22 ]

Author:

{'name': 'Vojislav Stojkovic', 'email': 'vojislav.stojkovic@mongodb.com', 'username': 'vstojkovic-mongodb'}

Message: SERVER-63261 Add metrics for wait time for requests to acquire egress connections
Branch: master
https://github.com/mongodb/mongo/commit/136275a221896f712ee6ba874f6fb0aeb260cb28

Generated at Thu Feb 08 05:57:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.