Resharding participants don't handle change streams monitor failures immediately

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Resharding donors and recipients start change stream monitors but they do not immediately handle monitor failures. Instead, errors remain unchecked until participants later call, causing delayed error detection and potentially allowing operations to continue only to fail later.

      Current Behavior:

      • When the participants start monitoring, the changeStreamsMonitor returns a semi future.
      • The returned future is stored but not immediately checked for errors.
      • If the monitor fails (e.g., due to network issues, cursor problems, etc.), the error sits in the fulfilled future.
      • Donor/Recipient continues with other operations, unaware of the monitor failure.
      • Error is only surfaced much later when awaitChangeStreamsMonitorCompleted() is called.

      SERVER-104946 will be committing an unit test demonstrating this behavior on the donor. A possible solution is to add a background task that monitors the change streams monitor future and immediately handles errors.s. 

            Assignee:
            Kruti Shah
            Reporter:
            Kruti Shah
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: