-
Type: Question
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.20
-
Component/s: Replication
-
None
We are experiencing a strange error in replication. We are using "chainingAllowed" : true". It seems that sometimes replication randomly stops and replica members will not be able find a valid sync source, instead a replica member will just keep trying the same sync source over and over again untill a point where it can no longer catch up at all due to oplog being to stale. Here is the log of a failing replica member:
2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
2018-08-20T15:19:38.381-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
ing a new sync source for the same reason as last time.
2018-08-20T15:19:39.381-0600 I REPL [SyncSourceFeedback] setting syncSourceFeedback to redacted-host-name-03.local:27017
2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
2018-08-20T15:19:39.394-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
ing a new sync source for the same reason as last time.
Restarting the mongod service or changing the replicaset config seems to force the replica member out of this loop and allows it to sync again to a non-stale member. Nickolas Golubev @ 16:03
The expected behavior would be for the replica member to try a different sync source instead of the same one over and over again.
- duplicates
-
SERVER-29837 TopologyCoordinator::shouldChangeSyncSource() should consider chainingAllowed setting when comparing sync source's optime against secondaries
- Closed