Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15639

Investigate changes in SERVER-64965: Count the number of operations that fail due to timing out waiting to acquire a connection

      Original Downstream Change Summary

      Two new metrics added to serverStatus.
      1) metrics.operation.numConnectionNetworkTimeouts: number of operations that fail due to timing out while waiting to acquire a connection
      2) metrics.operation.totalTimeWaitingBeforeConnectionTimeoutMillis: cumulative time operations spent waiting before failing due to timing out while waiting to acquire a connection

      Description of Linked Ticket

      When 'bursts' of operations occur that all require access to a connection to perform some RPC, our connection pools don't always have enough pooled connections to service all of the operations. In this case, operations get bottlenecked behind connection establishment. In more extreme cases, operations will fail due to reaching their max time ms limit while waiting to acquire a connection. To better understand when our connection pooling infrastructure is related to user-facing workload degradation, let's add a counter to count how many operations fail due to timing out waiting to acquire a connection. This counter should be reported in FTDC. Additionally, let's make sure we log how long operations that fail for this reason spent waiting to acquire a connection, so we can check that an unreasonable amount of time was spent waiting.  

            Assignee:
            jason.price@mongodb.com Jason Price
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              1 year, 29 weeks, 4 days ago