Details
-
Investigation
-
Status: Accepted
-
Major - P3
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Not Needed
Description
Possible changes to FTDC/serverStatus.
Description of Linked Ticket
Summary
We should refactor and reorganize the ConnectionPool implementation to remove unnecessary abstractions. This would help with understanding and maintaining the code. Allocating a dedicated executor thread for each ConnectionPool can also make its behavior more predictable (e.g., make many synchronizations unnecessary). We may also evaluate the possibility and benefits of replacing many connection pools, which is the current design, with a single pool, capable of providing similar guarantees for egress connections. Finally, we should add more diagnostics to the pool (e.g., running averages for the duration of creating new, refreshing existing, and returning checked-out connections) and create a section in FTDC that reports these metrics for all existing instances of ConnectionPool. This improvement helps with investigating incidents similar to HELP-27338 and is aligned with making the code-base more maintainable and easier to debug.
Motivation
Managing egress connections (e.g., creating new and refreshing existing connections) is implemented through several layers of abstraction, such as ConnectionPool, SpecificPool, TLConnection, and TLTypeFactory. These types internally interact to maintain a set of connection pools, and rely on an external executor (i.e., the networking reactor thread) for housekeeping. With respect to diagnostics, each pool only reports the aggregated number of created, available, refreshing, refreshed, and in-use connections. Furthermore, we only record these metrics on Mongos and for the connection pools owned by the ShardingTaskExecutor.
Documentation
Product Description
Scope Document
Technical Design Document