Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48381

Allow syncing from a node with the same optime if it doesn't introduce a cycle

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Repl 2020-06-15, Repl 2020-06-29

      When re-evaluating its sync source, a node will check that there is a closer node that is ahead of it (it will use heartbeat data to determine if the other node is ahead). There could be certain configurations where nodes are in the same data center (meaning very close to each other), but none of the nodes are able to choose the others as a sync source because they are at similar optimes and stale heartbeats prevent nodes from thinking they are behind others.

      For example, consider the attached image. If A and B are both syncing from the primary, they likely have similar network latency. If B is deciding if it should switch to A, it needs to know that A is ahead of it using heartbeat information. Since heartbeats could be stale and only happen every 2 seconds, it's possible that B wouldn't think that A was ahead of it for a long time, preventing having only one link between data centers.

      One possible way to solve this is to relax the constraint that a node must be ahead of the syncing node to be considered a valid sync source. Ideally we could make sure that we only do this when it wouldn't cause sync source cycles. If that's not possible, one option is to implement a distributed cycle detection algorithm.

            Assignee:
            xuerui.fa@mongodb.com Xuerui Fa
            Reporter:
            samy.lanka@mongodb.com Samyukta Lanka
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: