-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
Service Arch 2022-05-16, Service Arch 2022-05-30, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11, Service Arch 2022-07-25, Service Arch 2022-08-08, Service Arch 2022-08-22, Service Arch 2022-09-05, Service Arch 2022-09-19, Service Arch 2022-10-03
When 'bursts' of operations occur that all require access to a connection to perform some RPC, our connection pools don't always have enough pooled connections to service all of the operations. In this case, operations get bottlenecked behind connection establishment. In more extreme cases, operations will fail due to reaching their max time ms limit while waiting to acquire a connection. To better understand when our connection pooling infrastructure is related to user-facing workload degradation, let's add a counter to count how many operations fail due to timing out waiting to acquire a connection. This counter should be reported in FTDC. Additionally, let's make sure we log how long operations that fail for this reason spent waiting to acquire a connection, so we can check that an unreasonable amount of time was spent waiting.