Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35950

Replication storm when replica set members lag over WAN

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • ALL
    • Hide
      1. Create a 7-member replset spread across 2 DCs with limited WAN capacity.
      2. Apply write load to primary that just exceeds WAN link capacity.
      3. Remote secondaries will start lagging until lag reaches 30 seconds.
      4. Then all secondaries will switch to a main DC node as sync source making the lag much worse until all remote secondaries fall off the oplog.
      Show
      Create a 7-member replset spread across 2 DCs with limited WAN capacity. Apply write load to primary that just exceeds WAN link capacity. Remote secondaries will start lagging until lag reaches 30 seconds. Then all secondaries will switch to a main DC node as sync source making the lag much worse until all remote secondaries fall off the oplog.
    • Repl 2018-07-30

      Consider a replica set spread over two DCs with multiple secondaries in each connected by a WAN. (Not an uncommon scenario for many users with main and DR sites.) Under normal conditions, the secondaries will chain such that a single copy of the replicated data will cross the WAN because most secondaries will chain such that lag is minimized.

      Now consider what happens when the WAN becomes overloaded. The remote secondary replicating from the main site will start to lag as it cannot pull operations across the WAN fast enough. The other remote secondaries will notice this lag and when it hits 30 seconds (aka maxSyncSourceLagSecs) those secondaries will re-evaluate their sync sources selecting one of the nodes in the main DC. This will put additional strain on the already overloaded WAN and make it more likely that all remote secondaries will fall off the oplog as they fall further and further behind the primary.

            Assignee:
            tess.avitabile@mongodb.com Tess Avitabile (Inactive)
            Reporter:
            james.kovacs@mongodb.com James Kovacs
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: