-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.4
-
Repl 2020-07-27, Repl 2020-09-21, Repl 2020-10-05
-
9
Initial sync is currently resumable after certain transient failures, including brief outages due to sync source restart. When a sync source goes down and back up again, we generally expect that to be observed as a network error. However, there is a specific window in which attempts against such a freshly restarted sync source can error in a way that is fatal to the initial sync process. It is possible for the oplog fetcher to try to read from the remote oplog while that node is still in STARTUP state and is at (0,0) appliedThrough and clusterTime; in such a scenario the read will fail as the sync source is unable to satisfy the afterClusterTime (0,1) read (this error happens on a request validation level). Since this is not a network error, we will not use the new oplog fetcher restart strategy and instead fall back to the old behavior, which is to retry n (default 10) times. If those retries are exhausted, then initial sync fails.
- depends on
-
SERVER-50140 Initial sync cannot survive unclean restart of the sync source
- Closed