Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Create a 7-member replset spread across 2 DCs with limited WAN capacity.

Apply write load to primary that just exceeds WAN link capacity.

Remote secondaries will start lagging until lag reaches 30 seconds.

Then all secondaries will switch to a main DC node as sync source making the lag much worse until all remote secondaries fall off the oplog.
Show
Create a 7-member replset spread across 2 DCs with limited WAN capacity. Apply write load to primary that just exceeds WAN link capacity. Remote secondaries will start lagging until lag reaches 30 seconds. Then all secondaries will switch to a main DC node as sync source making the lag much worse until all remote secondaries fall off the oplog.
Sprint:
Repl 2018-07-30
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Consider a replica set spread over two DCs with multiple secondaries in each connected by a WAN. (Not an uncommon scenario for many users with main and DR sites.) Under normal conditions, the secondaries will chain such that a single copy of the replicated data will cross the WAN because most secondaries will chain such that lag is minimized.

Now consider what happens when the WAN becomes overloaded. The remote secondary replicating from the main site will start to lag as it cannot pull operations across the WAN fast enough. The other remote secondaries will notice this lag and when it hits 30 seconds (aka maxSyncSourceLagSecs) those secondaries will re-evaluate their sync sources selecting one of the nodes in the main DC. This will put additional strain on the already overloaded WAN and make it more likely that all remote secondaries will fall off the oplog as they fall further and further behind the primary.

Assignee:: Tess Avitabile (Inactive)

Reporter:: James Kovacs

Participants:: James Kovacs, Tess Avitabile

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: Jul 03 2018 05:29:00 PM UTC

Updated:: Jul 26 2018 06:22:27 PM UTC

Resolved:: Jul 12 2018 04:15:13 PM UTC

Details

Description

Attachments

Activity

People

Dates