Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8235

too-frequent sync source changes causes node to fall behind

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0-rc1
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      1) Setup a replica set with 5 nodes A, B, C, D, E, primary is A
      2) Artificially slaveDelay nodes B and C by 40s, and node D by 80s
      3) Artificially introduce network latency such D's ping time to B and C is low but varies, and the latency from D to E is high (so E will never be chosen as a sync source)
      4) Symmetrically blackhole connections from A to D, to make sure D doesn't choose the primary as a sync source initially (could be done differently, this was just easiest).
      5) Start single-threaded javascript insert load on node A

      The idea here is that there are three lagging nodes in the replica set, B, C, and D. Because E is 30s ahead of B and C, D will keep trying to change sync sources. However, since B and C are closer in ping time than E, E will never be chosen and the sync source changes between C and D depending on the stochastic ping delay.

      This kind of problem might occur in the wild if three lagging nodes were in a separate data center, for example. Once a node falls too far behind the others, the sync swapping would push the node into permanent recovery mode (until one of the other nodes catches up).

      Show
      1) Setup a replica set with 5 nodes A, B, C, D, E, primary is A 2) Artificially slaveDelay nodes B and C by 40s, and node D by 80s 3) Artificially introduce network latency such D's ping time to B and C is low but varies, and the latency from D to E is high (so E will never be chosen as a sync source) 4) Symmetrically blackhole connections from A to D, to make sure D doesn't choose the primary as a sync source initially (could be done differently, this was just easiest). 5) Start single-threaded javascript insert load on node A The idea here is that there are three lagging nodes in the replica set, B, C, and D. Because E is 30s ahead of B and C, D will keep trying to change sync sources. However, since B and C are closer in ping time than E, E will never be chosen and the sync source changes between C and D depending on the stochastic ping delay. This kind of problem might occur in the wild if three lagging nodes were in a separate data center, for example. Once a node falls too far behind the others, the sync swapping would push the node into permanent recovery mode (until one of the other nodes catches up).

      Description

      Given particular replica set configurations and network delays, it's possible for a node to repeatedly change the host it syncs from after every replicated batch. This causes the node to fall behind, and eventually the node ends up in recovery mode.

        Attachments

        1. currentTest_fallbehind.txt
          897 kB
        2. currentTest.txt
          1.97 MB
        3. helpers.js
          5 kB
        4. sync_fast_switch.js
          3 kB
        5. sync_fast_switch.js
          3 kB

          Issue Links

            Activity

              People

              Assignee:
              kristina Kristina Chodorow
              Reporter:
              greg_10gen Greg Studer
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: