Bound the poll loops in test_layered_checkpoint08 so it fails fast instead of hanging

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0, 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: Test Python
    • None
    • Storage Engines, Storage Engines - Persistence
    • 0.001
    • SE Persistence backlog
    • None

      Motivation

      test_layered_checkpoint08 contains two unbounded while True poll loops. When the engine fails to make progress (see WT-17700, where the dhandle sweep server stalled for 60+ minutes in the palite/disaggregated config), the test spins silently until a task-level timeout fires — the 2h Evergreen idle timeout, or the 1800s per-test timeout in the parallel_checkpoint hook task (WT-17687). This wastes CI time and produces a misleading "task-timed-out" signature instead of a clear test failure.

      Scope

      This ticket is the test-robustness follow-up only. The substantive engine bug — why the sweep server stalled in the palite config — is tracked separately under WT-17700 (Foundations).

      Approach

      Bound both poll loops with a 60-second deadline and a descriptive assertion:
      * The sweep-wait loop (dh_sweep_dead_close). With close_scan_interval=1 and close_idle_time=1 a dead handle is closed within a couple of sweep cycles (~1-2s), so 60s is generous even under ASAN/loaded CI while giving ~120x faster feedback than the 2h timeout.
      * The wait_for_checkpoint_start() loop (checkpoint_state). The checkpoint thread is already running and the checkpoint takes >=10s due to timing_stress_for_test=[checkpoint_slow], so the state should flip within a fraction of a second.

      Definition of done
      * Both loops in test_layered_checkpoint08 bounded with a clear assertion on timeout.
      * Test still passes on the happy path.

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: