Resharding participants don't handle change streams monitor failures immediately

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Resharding donors and recipients start change stream monitors but they do not immediately handle monitor failures. Instead, errors remain unchecked until participants later call, causing delayed error detection and potentially allowing operations to continue only to fail later.

      Current Behavior:

      • When the participants start monitoring, the changeStreamsMonitor returns a semi future.
      • The returned future is stored but not immediately checked for errors.
      • If the monitor fails (e.g., due to network issues, cursor problems, etc.), the error sits in the fulfilled future.
      • Donor/Recipient continues with other operations, unaware of the monitor failure.
      • Error is only surfaced much later when awaitChangeStreamsMonitorCompleted() is called.

      SERVER-104946 will be committing an unit test demonstrating this behavior on the donor. A possible solution is to add a background task that monitors the change streams monitor future and immediately handles errors.s. 

              Assignee:
              Kruti Shah
              Reporter:
              Kruti Shah
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: