[SERVER-50763] Log when flow control is engaged and disengaged Created: 03/Sep/20  Updated: 06/Dec/22  Resolved: 25/May/21

Status: Closed
Project: Core Server
Component/s: Logging, Performance
Affects Version/s: 4.4.0
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Mark Callaghan (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Participants:

 Description   

The engagement determination will be based on whether the targetRateLimit (as seen in FTDC) is throttled to be less than the maximum. Every time the targetRateLimit transitions from "max" to "less than max", the "flow control engaged" message will be logged, and every time the targetRateLimit transitions from "less than max" to "max", the "flow control disengaged" message will be logged. Note that depending on the workload, these messages could be very noisy, so let's not introduce them at the default log level, but instead at debug log level 1.

Original description:

This is a feature request. The mongod error log has messages to indicate when flow control is engaged. This request is to log when it is disengaged. That makes it easier to see when flow control might explain performance problems.

An example when it is engaged is:

{"t":{"$date":"2020-09-03T20:18:35.656+00:00"},"s":"W",  "c":"STORAGE",  "id":22225,   "ctx":"FlowControlRefresher","msg":"Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries."}



 Comments   
Comment by Connie Chen [ 25/May/21 ]

We don't have any customer requirements and are concerned that any log messages may be misinterpreted.

Comment by Eric Milkie [ 14/Sep/20 ]

After chatting with Mark, I have changed the Description to reflect the technical description of what we want: a message indicating when flow control starts throttling (at all), and when it stops throttling. As Bruce points out, this will indeed require some knowledge about how flow control works: the log messages will be tracking the targetRateLimit statistic, which is equivalent to the number of operation tickets available per quantum. As the targetRateLimit decreases, this reduces the amount of potential work able to run.

Comment by Bruce Lucas (Inactive) [ 04/Sep/20 ]

This would be useful, but requires cautious interpretation, because flow control disengaging is a gradual process - when it detects that the secondaries are no longer lagged, it begins gradually lifting the throttle. You can see this effect in ftdc in the targetRateLimit metric, and you can gauge the magnitude of flow control impact at any given point in time with the timeAcquiringMicros metric.

Generated at Thu Feb 08 05:23:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.