Replace timeout-based waitForDone with heartbeat-driven progress tracking in change stream test Connector and allow fsm tests to use multiple parallel writers

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible
    • QE 2026-03-30, QE 2026-04-13
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Connector.waitForDone() currently polls a binary done flag via assert.soonNoExcept with a fixed timeout (10s local / 10min Evergreen). For slow operations like parallel DDL writers, this forces callers to pass ad-hoc timeout overrides tied to the environment — a fragile pattern that doesn't distinguish "slow but alive" from "hung."

      Problem:

      • Background Writer/Reader threads log per-command/per-event progress, but the Connector has no visibility into this progress.
      • A hung thread wastes the full timeout before failing; a slow-but-alive thread may exceed any fixed timeout.
      • Environment-specific timeout hacks (TestData?.inEvergreen ? undefined : 5 * 60 * 1000) are brittle and don't solve the root cause.

      Solution:
      Add a heartbeat mechanism where background threads increment a sequence counter in the notifications collection after each unit of work (command executed / event read). Rewrite waitForDone() to loop per heartbeat increment using assert.soonNoExcept — each call is bounded to one command's duration. If a single command hangs, the default assert.soon timeout fires the hang analyzer. Total wait is unbounded as long as commands keep completing.

      Acceptance criteria:

      1. No caller of waitForDone needs to pass custom timeouts
      2. If a single command/event hangs, assert.soonNoExcept times out within the default window and fires the hang analyzer
      3. Parallel DDL writers that make steady progress complete without timeout, regardless of cumulative duration
      4. Existing tests pass without modification (beyond removing the timeout hack)

            Assignee:
            Nicola Cabiddu
            Reporter:
            Denis Grebennicov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: