Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.20
Component/s: Replication
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We are experiencing a strange error in replication. We are using "chainingAllowed" : true". It seems that sometimes replication randomly stops and replica members will not be able find a valid sync source, instead a replica member will just keep trying the same sync source over and over again untill a point where it can no longer catch up at all due to oplog being to stale. Here is the log of a failing replica member:

2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
2018-08-20T15:19:38.371-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
2018-08-20T15:19:38.381-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
ing a new sync source for the same reason as last time.
2018-08-20T15:19:39.381-0600 I REPL [SyncSourceFeedback] setting syncSourceFeedback to redacted-host-name-03.local:27017
2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: -1, timestamp: Aug 19 15:29:07:1bf) which is more
than 30s behind member redacted-host-name-01.local:27017 whose most recent OpTime is (term: -1, timestamp: Aug 20 15:12:28:c5)
2018-08-20T15:19:39.386-0600 I REPL [ReplicationExecutor] syncing from: redacted-host-name-03.local:27017
2018-08-20T15:19:39.394-0600 I REPL [rsBackgroundSync] Chose same sync source candidate as last time, redacted-host-name-03.local:27017. Sleeping for 1 second to avoid immediately choos
ing a new sync source for the same reason as last time.

Restarting the mongod service or changing the replicaset config seems to force the replica member out of this loop and allows it to sync again to a non-stale member. Nickolas Golubev @ 16:03
The expected behavior would be for the replica member to try a different sync source instead of the same one over and over again.

duplicates

SERVER-29837 TopologyCoordinator::shouldChangeSyncSource() should consider chainingAllowed setting when comparing sync source's optime against secondaries

Closed

Assignee:: Nick Brewer (Inactive)
Reporter:: Matthew S Davis
Participants:: Matthew S Davis, Nick Brewer
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Aug 20 2018 11:08:16 PM UTC
Updated:: Sep 15 2018 02:48:54 PM UTC
Resolved:: Aug 20 2018 11:28:50 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates