[DOCS-15639] Investigate changes in SERVER-64965: Count the number of operations that fail due to timing out waiting to acquire a connection Created: 20/Sep/22  Updated: 13/Nov/23  Resolved: 26/Sep/22

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 6.2.0-rc0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Jason Price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-64965 Count the number of operations that f... Closed
Participants:
Days since reply: 1 year, 19 weeks, 2 days ago
Epic Link: DOCSP-22091
Story Points: 3

 Description   
Original Downstream Change Summary

Two new metrics added to serverStatus.
1) metrics.operation.numConnectionNetworkTimeouts: number of operations that fail due to timing out while waiting to acquire a connection
2) metrics.operation.totalTimeWaitingBeforeConnectionTimeoutMillis: cumulative time operations spent waiting before failing due to timing out while waiting to acquire a connection

Description of Linked Ticket

When 'bursts' of operations occur that all require access to a connection to perform some RPC, our connection pools don't always have enough pooled connections to service all of the operations. In this case, operations get bottlenecked behind connection establishment. In more extreme cases, operations will fail due to reaching their max time ms limit while waiting to acquire a connection. To better understand when our connection pooling infrastructure is related to user-facing workload degradation, let's add a counter to count how many operations fail due to timing out waiting to acquire a connection. This counter should be reported in FTDC. Additionally, let's make sure we log how long operations that fail for this reason spent waiting to acquire a connection, so we can check that an unreasonable amount of time was spent waiting.  



 Comments   
Comment by Githook User [ 26/Sep/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: DOCS-15639-connection-network-timeouts (#1899)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: v6.2
https://github.com/10gen/docs-mongodb-internal/commit/0eebaed2200898ee1c63b84b0421610542fdd5e5

Comment by Education Bot [ 20/Sep/22 ]

Fix Version updated for upstream SERVER-64965:
6.2.0-rc0

Generated at Thu Feb 08 08:13:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.