Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15371

Investigate changes in SERVER-63261: Add metrics for wait time for requests to acquire egress connections




      Original Downstream Change Summary

      New keys added to connPoolStats command output.

      Description of Linked Ticket

      On sharded clusters, client requests and system operations need to be routed from mongos to mongod, and occasionally from mongod to mongod. These requests need to acquire an outbound connection from the source host to the target host. This process is asynchronous, and we don't know how long it takes for requests to acquire a connection. As a result, it's difficult for us to determine when egress connection pooling might be a bottleneck for servicing requests, or what the exact impact on request latency is when the connection pool is under strain. 


      We should add metrics that answer the question: 'how much time does a request from (serverA) to (serverB) spend waiting to acquire a connection?' It might be best to use a histogram-based approach, in the style of SERVER-59858, where we maintain a histogram of wait-times for the last N connections/over the last X minutes. We could also 'rotate' the histograms, where we always keep one for the last (say) minute, and then have an 'aggregated' one of the last N minutes. It also would be ideal to collect the histograms on a per-targeted-host basis. 

      Generally, egress connections are acquired by requests  here for the NITL-based task executors. ScopedDBConnection, defined here, is also sometimes used to acquire connections, namely in the old "scanning" RSM, dbclient_rs, and a 1 or 2 sharded commands, but it may not be worth it to collect metrics for this outdated component.  




            jason.price@mongodb.com Jason Price
            backlog-server-pm Backlog - Core Eng Program Management Team
            0 Vote for this issue
            3 Start watching this issue


              1 year, 13 weeks, 2 days ago