Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.8.0, 4.4.3
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Sprint:
Repl 2020-07-27, Repl 2020-09-21, Repl 2020-10-05
Linked BF Score:
9
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Initial sync is currently resumable after certain transient failures, including brief outages due to sync source restart. When a sync source goes down and back up again, we generally expect that to be observed as a network error. However, there is a specific window in which attempts against such a freshly restarted sync source can error in a way that is fatal to the initial sync process. It is possible for the oplog fetcher to try to read from the remote oplog while that node is still in STARTUP state and is at (0,0) appliedThrough and clusterTime; in such a scenario the read will fail as the sync source is unable to satisfy the afterClusterTime (0,1) read (this error happens on a request validation level). Since this is not a network error, we will not use the new oplog fetcher restart strategy and instead fall back to the old behavior, which is to retry n (default 10) times. If those retries are exhausted, then initial sync fails.

depends on

SERVER-50140 Initial sync cannot survive unclean restart of the sync source

Closed

Assignee:: Xuerui Fa
Reporter:: Vesselina Ratcheva (Inactive)
Participants:: Githook User, Judah Schvimer, Vesselina Ratcheva, Xuerui Fa
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Jun 29 2020 05:48:49 AM UTC
Updated:: Oct 29 2023 10:06:22 PM UTC
Resolved:: Sep 25 2020 09:06:37 PM UTC
Confidence Status Last Update:: 09/Sep/20 7:59 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates