-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Labels:None
Two new metrics added to serverStatus.
1) metrics.operation.numConnectionNetworkTimeouts: number of operations that fail due to timing out while waiting to acquire a connection
2) metrics.operation.totalTimeWaitingBeforeConnectionTimeoutMillis: cumulative time operations spent waiting before failing due to timing out while waiting to acquire a connection
Description of Linked Ticket
When 'bursts' of operations occur that all require access to a connection to perform some RPC, our connection pools don't always have enough pooled connections to service all of the operations. In this case, operations get bottlenecked behind connection establishment. In more extreme cases, operations will fail due to reaching their max time ms limit while waiting to acquire a connection. To better understand when our connection pooling infrastructure is related to user-facing workload degradation, let's add a counter to count how many operations fail due to timing out waiting to acquire a connection. This counter should be reported in FTDC. Additionally, let's make sure we log how long operations that fail for this reason spent waiting to acquire a connection, so we can check that an unreasonable amount of time was spent waiting.
- documents
-
SERVER-64965 Count the number of operations that fail due to timing out waiting to acquire a connection
- Closed