[COMPASS-6022] Investigate changes in SERVER-64964: Measure egress connection creation time from connection pools Created: 16/Aug/22  Updated: 23/Aug/22  Resolved: 23/Aug/22

Status: Closed
Project: Compass
Component/s: None
Affects Version/s: None
Fix Version/s: No version

Type: Investigation Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-64964 Measure egress connection creation ti... Closed
Documentation Changes: Not Needed

 Description   
Original Downstream Change Summary

Adds a server parameter called "slowConnectionThresholdMillis" that specifies the threshold for slow egress connection establishment.

Emits a log line with ID 6496400 when the connection establishment time exceeds the configured threshold, with the breakdown of how long each phase of the connection establishment took.

Adds "metrics.mongos.totalConnectionEstablishmentTimeMillis" to the output of serverStatus, tracking the total time for egress connection establishment.

Description of Linked Ticket

Mongos (and mongod) maintain connection pools wrapped by task executors. These pooled connections are used to perform RPC. Sometimes, these RPC operations come in 'bursts' and no pre-existing pooled connections are available to serve some operations. In these cases, operations are bottlenecked behind new-connection-establishment. We'd like to understand how much time is spent establishing new connections to better understand this potential bottleneck.

Let's create a measurement of wall-time for this new-connection establishment. 

Ideally, it would also be good to include partial measurements for each step of the connection establishment, including:

  • Establishing the TCP connection/sockets.
  • Resolving DNS
  • Performing the TLS handshake
  • MongoDB auth 

Including parts of the detailed breakdown, especially MongoDB auth, may be better suited to break-out into separate tickets after some investigation. 


Generated at Wed Feb 07 22:41:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.