Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 5.2.0, 5.3.1, 5.2.1, 5.0.9, 4.4.15, 4.2.21
Component/s: None
Labels:
- former-quick-wins
- replication

Assigned Teams:

Replication
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi Team,

Starting in MongoDB v4.2 the Flow Control Mechanism was introduced in order to limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value of flowControlTargetLagSeconds.

At the same time, and whenever replication chaining is enabled, the sync source of a secondary will be changed if the most recent OpTime of the sync source is more than maxSyncSourceLagSecs seconds behind another member's latest oplog entry. This ensures that the sync source is not too far behind other nodes in the set. maxSyncSourceLagSecs is a server parameter and has a default value of 30 seconds.

The problem is that the value of maxSyncSourceLagSecs is bigger (3x) than the default value of 10 seconds for flowControlTargetLagSeconds and that can result in primary nodes being throttled by the Flow Control mechanism just because one secondary lags behind while enough secondary nodes to make up a majority also replicate from it. Imagine the following scenario:

Pri (DC1), Sec (DC1), Lagged-Sec (DC2), Chained-Sec (DC3), Chained-Sec (DC3)
Sec syncs from Pri, Lagged-Sec syncs from Pri, and Chained-Sec syncs from Lagged-Sec
If there are any issues with Lagged-Sec that results in accumulating replication lag between 10 to 30 seconds, it ends up with 3 nodes having lag of above 10 seconds and kicking off Flow Control
The above can result in severe impact to applications and it could go on and on if the lag floats between 10 to 30 seconds, or the lag presents as isolated spikes on that very same range.

If MongoDB were to consider the interplay between maxSyncSourceLagSecs and flowControlTargetLagSeconds in enviornments with chained replication enabled and revaluate its sync source before hitting flowControlTargetLagSeconds (or maybe shortly after?), then situations like the above would be avoided.

Some options I thought of:

Gossip the value of flowControlTargetLagSeconds from the current Primary to the other replica set members and automatically adjust maxSyncSourceLagSecs as a percentage of the former value. This would apply only when replication chaining is enabled.
Consider this dynamic adjustment only for nodes that have votes and priority set to 1 or above, meaning that they count towards the majority committed lag/point.

Regards
Diego

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Diego Rodriguez (Inactive)
Participants:: [DO NOT USE] Backlog - Replication Team, Daniel Gottlieb, Diego Rodriguez
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jun 22 2022 01:08:52 PM UTC
Updated:: Nov 01 2023 09:46:07 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates