Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-36775

Replication sync issue

    • Type: Icon: Question Question
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.20
    • Component/s: Replication
    • Labels:
      None

      We are experiencing a strange error in replication. We are using "chainingAllowed" : true". It seems that sometimes replication randomly stops and replica members will not be able find a valid sync source, instead a replica member will just keep trying the same sync source over and over again untill a point where it can no longer catch up at all due to oplog being to stale. Here is the log of a failing replica member:

      2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
      than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
      2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
      2018-08-20T15:19:38.381-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
      ing a new sync source for the same reason as last time.
      2018-08-20T15:19:39.381-0600 I REPL [SyncSourceFeedback] setting syncSourceFeedback to redacted-host-name-03.local:27017
      2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
      than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
      2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
      2018-08-20T15:19:39.394-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
      ing a new sync source for the same reason as last time.

      Restarting the mongod service or changing the replicaset config seems to force the replica member out of this loop and allows it to sync again to a non-stale member. Nickolas Golubev @ 16:03
      The expected behavior would be for the replica member to try a different sync source instead of the same one over and over again.

            Assignee:
            nick.brewer Nick Brewer
            Reporter:
            matthew.s.davis62.ctr@mail.mil Matthew S Davis
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: