[SERVER-8076] Increased tolerance around network connectivity issues on initial sync Created: 04/Jan/13  Updated: 06/Dec/22  Resolved: 23/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mark porter Assignee: Backlog - Replication Team
Resolution: Done Votes: 8
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-17139 More retries needed when there are sy... Closed
is duplicated by SERVER-14133 Make the initial sync process more re... Closed
Related
related to SERVER-18039 Add Initial Sync Skeleton to DataRepl... Closed
is related to SERVER-15410 Batch fetch missing documents during ... Closed
Assigned Teams:
Replication
Participants:

 Description   

Investigate making initial sync more tolerant regarding network connectivity issues on the sync, i.e. investigate the possibility of a fresh TCP connection to resume the sync as opposed to going back to the start.

There are three distinct areas that would have to handle this better: the one they ran into (querying the primary for its latest optime), cloning, and connecting to fetch missing documents (as in their linked ticket).

One could also argue that it should be more flexible about changing initial sync targets mid-stream, however, that may be more difficult.



 Comments   
Comment by Spencer Brody (Inactive) [ 23/Nov/16 ]

Handling of transient network errors was improved as part of the initial sync rewrite that went into 3.4

Comment by Pieter Jordaan [ 23/Aug/15 ]

This is a huge frustration. Trying to replicate two 3.0.5 servers with +- 20 million documents and it keeps failing silently. Resume initial sync would be a great feature. I now have to manually sync with snapshots.

Generated at Thu Feb 08 03:16:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.