[SERVER-8076] Increased tolerance around network connectivity issues on initial sync Created: 04/Jan/13 Updated: 06/Dec/22 Resolved: 23/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mark porter | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 8 |
| Labels: | sync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Investigate making initial sync more tolerant regarding network connectivity issues on the sync, i.e. investigate the possibility of a fresh TCP connection to resume the sync as opposed to going back to the start. There are three distinct areas that would have to handle this better: the one they ran into (querying the primary for its latest optime), cloning, and connecting to fetch missing documents (as in their linked ticket). One could also argue that it should be more flexible about changing initial sync targets mid-stream, however, that may be more difficult. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 23/Nov/16 ] |
|
Handling of transient network errors was improved as part of the initial sync rewrite that went into 3.4 |
| Comment by Pieter Jordaan [ 23/Aug/15 ] |
|
This is a huge frustration. Trying to replicate two 3.0.5 servers with +- 20 million documents and it keeps failing silently. Resume initial sync would be a great feature. I now have to manually sync with snapshots. |