[DOCS-15652] [SERVER] Investigate changes in SERVER-64967: Measure how long it takes operations using egress connections to write to network Created: 28/Sep/22  Updated: 13/Nov/23  Resolved: 06/Oct/22

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 6.2.0-rc0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Jason Price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File output.txt    
Issue Links:
Documented
documents SERVER-64967 Measure how long it takes operations ... Closed
Participants:
Days since reply: 1 year, 17 weeks, 6 days ago
Epic Link: DOCSP-22091
Story Points: 5

 Description   
Original Downstream Change Summary

This adds a new serverStatus metric that is guarded under the featureFlagConnHealthMetrics feature flag:

'metrics.network. totalTimeForEgressConnectionAcquiredToWireMicros':

  • The time taken for an operation to acquire an egress connection and complete writing to the wire. This metric is cumulative across all such connections since the last server restart.

Also a new server parameter was added: 'connectionAcquisitionToWireLoggingRate':
description: >-
The rate at which egress connection metrics below a certain time threshold will be logged at
info level. This only applies for the 'network.totalConnectionAcquiredToWireMillis'
server status metric.

This server parameter is settable at startup and on runtime. The default value is 0.2. Values must be between 0.0 and 1.0.

Description of Linked Ticket

We sometimes see operations get bottlenecked at the point in their lifecycle where they need to perform RPC. In SERVER-64964, SERVER-64965, and SERVER-63261, we're adding metrics to measure how long connection establishment takes, how many operations fail while waiting to acquire connections, and how long operations spend waiting to acquire connections.

For completeness, let's also measure the amount of time operations spend doing networking after they acquire a connection. Add a measurement of how much wall-time passes between an operation acquiring a connection and it completes writing to the wire. 



 Comments   
Comment by Githook User [ 06/Oct/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: DOCS-15652 write to wire metric (#1963)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: v6.2
https://github.com/10gen/docs-mongodb-internal/commit/834c5705d92283a601a439e85da3ccafee169430

Comment by Jason Chan [ 28/Sep/22 ]

Attached server status metric output

Comment by Education Bot [ 28/Sep/22 ]

Fix Version updated for upstream SERVER-64967:
6.2.0-rc0

Generated at Thu Feb 08 08:13:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.