-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
Fully Compatible
-
QE 2026-03-30, QE 2026-04-13
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Connector.waitForDone() currently polls a binary done flag via assert.soonNoExcept with a fixed timeout (10s local / 10min Evergreen). For slow operations like parallel DDL writers, this forces callers to pass ad-hoc timeout overrides tied to the environment — a fragile pattern that doesn't distinguish "slow but alive" from "hung."
Problem:
- Background Writer/Reader threads log per-command/per-event progress, but the Connector has no visibility into this progress.
- A hung thread wastes the full timeout before failing; a slow-but-alive thread may exceed any fixed timeout.
- Environment-specific timeout hacks (TestData?.inEvergreen ? undefined : 5 * 60 * 1000) are brittle and don't solve the root cause.
Solution:
Add a heartbeat mechanism where background threads increment a sequence counter in the notifications collection after each unit of work (command executed / event read). Rewrite waitForDone() to loop per heartbeat increment using assert.soonNoExcept — each call is bounded to one command's duration. If a single command hangs, the default assert.soon timeout fires the hang analyzer. Total wait is unbounded as long as commands keep completing.
Acceptance criteria:
- No caller of waitForDone needs to pass custom timeouts
- If a single command/event hangs, assert.soonNoExcept times out within the default window and fires the hang analyzer
- Parallel DDL writers that make steady progress complete without timeout, regardless of cumulative duration
- Existing tests pass without modification (beyond removing the timeout hack)