Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-86591

Enhance InitialSync Error Handling for Transient Network Issues

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • Fully Compatible
    • v7.3, v7.0, v6.0, v5.0
    • Repl 2024-03-18, Repl 2024-04-01

      Unfortunately, it seems we need to reassess our InitialSync logic for handling transient network errors. Currently, this logic operates on a per-stage basis. It records the time of the first network error and checks if the time difference between any following error is greater than the configured retry period (initialSyncTransientErrorRetryPeriodSeconds), initially set to 24 hours. However, this logic does not reset the time of the first network error if we have successfully proceeded. Consequently, in scenarios where we are cloning large collections, the Query stage might span several days. In such cases, encountering a network hiccup on the first day and another on the last day is sufficient to mark the entire initial sync attempt as failed, triggering a retry of the initial sync from the beginning.

            Assignee:
            m.maher@mongodb.com Moustafa Maher
            Reporter:
            m.maher@mongodb.com Moustafa Maher
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: