Add logic to retry where needed when network errors are encountered

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • Repl 2026-03-30
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Disagg PIT restore may hit network errors. We need to retry accordingly.

      Ideally, we're already using lower-level retry machinery and so there's nothing for us to build here.

      SInce we're going with reusing the primary and standby codepath, we should get log server retryablility for free. However, retryability on the ORP is unknown. We need to make sure that if we hit a network error when talking to the ORP, we retry correctly and don't miss data. We should also make sure that we have a timeout on the connection with the ORP in case it takes too long.

      Additionally, we should make sure that if the retry logic is triggered it doesn't break Disagg PIT Restore by racing with step up or something like that.

            Assignee:
            Evelyn Wu
            Reporter:
            Vishnu Kaushik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: