[SERVER-67434] Improve Sync Source Selection with Chained Replication and Flow Control Created: 22/Jun/22  Updated: 01/Nov/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 5.2.0, 5.3.1, 5.2.1, 5.0.9, 4.4.15, 4.2.21
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Diego Rodriguez (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:
Case:

 Description   

Hi Team,

Starting in MongoDB v4.2 the Flow Control Mechanism was introduced in order to limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value of flowControlTargetLagSeconds.

At the same time, and whenever replication chaining is enabled, the sync source of a secondary will be changed if the most recent OpTime of the sync source is more than maxSyncSourceLagSecs seconds behind another member's latest oplog entry. This ensures that the sync source is not too far behind other nodes in the set. maxSyncSourceLagSecs is a server parameter and has a default value of 30 seconds.

The problem is that the value of maxSyncSourceLagSecs is bigger (3x) than the default value of 10 seconds for flowControlTargetLagSeconds and that can result in primary nodes being throttled by the Flow Control mechanism just because one secondary lags behind while enough secondary nodes to make up a majority also replicate from it. Imagine the following scenario:

  1. Pri (DC1), Sec (DC1), Lagged-Sec (DC2), Chained-Sec (DC3), Chained-Sec (DC3)
  2. Sec syncs from Pri, Lagged-Sec syncs from Pri, and Chained-Sec syncs from Lagged-Sec
  3. If there are any issues with Lagged-Sec that results in accumulating replication lag between 10 to 30 seconds, it ends up with 3 nodes having lag of above 10 seconds and kicking off Flow Control
  4. The above can result in severe impact to applications and it could go on and on if the lag floats between 10 to 30 seconds, or the lag presents as isolated spikes on that very same range.

If MongoDB were to consider the interplay between maxSyncSourceLagSecs and flowControlTargetLagSeconds in enviornments with chained replication enabled and revaluate its sync source before hitting flowControlTargetLagSeconds (or maybe shortly after?), then situations like the above would be avoided.

Some options I thought of:

  • Gossip the value of flowControlTargetLagSeconds from the current Primary to the other replica set members and automatically adjust maxSyncSourceLagSecs as a percentage of the former value. This would apply only when replication chaining is enabled.
  • Consider this dynamic adjustment only for nodes that have votes and priority set to 1 or above, meaning that they count towards the majority committed lag/point.

Regards
Diego



 Comments   
Comment by Diego Rodriguez (Inactive) [ 22/Aug/22 ]

Hi daniel.gottlieb@mongodb.com,

The disadvantage I see with that approach is that we act once the problem is already there: a majority of your nodes are lagging and flow control is already engaged and throttling writes.

By propagating the flow control configuration you can directly avoid engaging flow control in scenarios like the one above by telling your Secondaries to re-evaluate the sync source if the lag against the source is about to get close to flowControlTargetLagSeconds.

Comment by Daniel Gottlieb (Inactive) [ 27/Jun/22 ]

Maybe a simpler alternative than having a primary propagate its flow control configuration is to instead propagate its state, i.e: "I am currently throttling due to flow control". And using that information to hint to chained secondaries to change their sync source.

Generated at Thu Feb 08 06:08:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.