[SERVER-46114] Flow-control engages on a single-node replica set Created: 12/Feb/20  Updated: 27/Oct/23  Resolved: 16/Apr/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Geert Bosch Assignee: Maria van Keulen
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Comparison.png     PNG File Flow Control disengaged.png     PNG File No Flow Control.png     PNG File SERVER-46114.png     PNG File lastDurable lag comparison.png     PNG File single node rs lag.png    
Issue Links:
Related
related to SERVER-45880 Flow Control lag detection mechanism ... Closed
Operating System: ALL
Steps To Reproduce:

The log file includes messages like:

2020-02-12T16:33:39.901-0500 W  STORAGE  [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.

Sprint: Execution Team 2020-04-20
Participants:

 Description   

Flow-control is engaging on single-node replica sets, but it should not.

2020-02-12T16:33:39.901-0500 W  STORAGE  [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.



 Comments   
Comment by Mark Callaghan (Inactive) [ 03/Sep/20 ]

Percona shared a bug report with me and this is the cause. See SERVER-50749. The impact is significant. First the load rate drops. Second, some inserts take 200 to 300 seconds when flow control engages.

Comment by Alexander Gorrod [ 01/Jul/20 ]

From the chart, it would appear that flow control is smoothing out the spikes in throughput over time, at the expense of average throughput.

It looks like I jumped too soon on this one. I still think it's surprising that flow control is kicking in on a single node replica set, but it isn't hurting the pytpcc load phase in my testing, so that isn't a reason to resurrect this ticket.

Comment by Bruce Lucas (Inactive) [ 30/Jun/20 ]

For the purposes of doing that comparison it might be useful to disable flow control on both 4.2 and 4.4 in order to remove that complication and focus on measuring the relative storage engine performance.

Comment by Eric Milkie [ 30/Jun/20 ]

We could consider it, but my understanding of WT-6444 is that it is investigating the performance difference between 4.2 and 4.4, and for both of those versions the flow control logic is the same.

From the chart, it would appear that flow control is smoothing out the spikes in throughput over time, at the expense of average throughput.  Wouldn't a chart look similar for a multi-node replica set?  I am not sure if users would prefer more spiky performance or more stable performance, and also not sure if that preference is different for one-node replica sets.

 

Comment by Alexander Gorrod [ 30/Jun/20 ]

milkie and michael.gargiulo I've been running the pytpcc load workload described in WT-6444, and trying to improve the performance of WiredTiger with that workload. The workload is running a 1-node replica set, and I'm seeing that flow control is kicking in and causing the average throughput to reduce from 35k to 27k operations/second. See the picture below which shows a lot of threads waiting on flow control. The throughput is also less stable, but we have other changes in testing which should make the throughput more stable, and achieve an even higher throughput.

Would you reconsider whether flow control should be disabled in 1-node replica set configurations?

Comment by Maria van Keulen [ 16/Apr/20 ]

In case anybody is curious, I also did a run of this workload with Flow Control disabled entirely, with similar write latency spikes and durable/lastCommitted lag:

Here's a side-by-side comparison, including throughput:

Comment by Maria van Keulen [ 16/Apr/20 ]

It looks like the lastCommitted lag coincides with the lastDurable lag for this workload:

The lastDurable OpTime is used to determine the lastCommitted OpTime by default. So, any lag between lastApplied and lastDurable can influence lastCommitted lag.

daniel.gottlieb also pointed out that an overwhelmed system could starve writers long enough for Flow Control to kick in.

There are spikes in write latency throughout this workload, even when Flow Control isn't engaged, suggesting strain on the mongod. Here's an example of such spikes in a section where Flow Control was not throttling any writes:

I think the lag in lastDurable is due to the strain on the system, and we can conclude that the Flow Control lag measure can be overstated when the node itself is under strain. There is an existing ticket, SERVER-45880, to reconsider whether lastApplied versus lastCommitted is the best measure of lag. Other than that, I don't think there's any further work that needs to be done for SERVER-46114.

Comment by Maria van Keulen [ 15/Apr/20 ]

I was able to reproduce this issue locally. I looked at the FTDC data, and Flow Control's lag detection does report "lag".
Flow Control's lag check takes a few factors into consideration. Most notably, it considers the difference between the lastCommitted wall clock time and the lastApplied wall clock time measured on the primary. As of SERVER-42917, it is not possible for this lag check to be thrown off by a lastCommitted wall clock time that happens to be greater than lastApplied wall clock time (see comment thread in SERVER-42906 for a description of how this can happen).
However, I found in my repro that it is possible for the lastCommitted wall clock time (and associated OpTime) to be behind the lastApplied wall clock time/OpTime, even in a single-node replica set. As a result, Flow Control will report a "lag", and throttle the primary.
I'm not sure how this can be the case, so I'll continue to investigate. I'll keep this ticket posted with any findings.

Generated at Thu Feb 08 05:10:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.