-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Test Python
-
None
-
Storage Engines, Storage Engines - Persistence
-
0.001
-
SE Persistence backlog
-
None
Motivation
test_layered_checkpoint08 contains two unbounded while True poll loops. When the engine fails to make progress (see WT-17700, where the dhandle sweep server stalled for 60+ minutes in the palite/disaggregated config), the test spins silently until a task-level timeout fires — the 2h Evergreen idle timeout, or the 1800s per-test timeout in the parallel_checkpoint hook task (WT-17687). This wastes CI time and produces a misleading "task-timed-out" signature instead of a clear test failure.
Scope
This ticket is the test-robustness follow-up only. The substantive engine bug — why the sweep server stalled in the palite config — is tracked separately under WT-17700 (Foundations).
Approach
Bound both poll loops with a 60-second deadline and a descriptive assertion:
* The sweep-wait loop (dh_sweep_dead_close). With close_scan_interval=1 and close_idle_time=1 a dead handle is closed within a couple of sweep cycles (~1-2s), so 60s is generous even under ASAN/loaded CI while giving ~120x faster feedback than the 2h timeout.
* The wait_for_checkpoint_start() loop (checkpoint_state). The checkpoint thread is already running and the checkpoint takes >=10s due to timing_stress_for_test=[checkpoint_slow], so the state should flip within a fraction of a second.
Definition of done
* Both loops in test_layered_checkpoint08 bounded with a clear assertion on timeout.
* Test still passes on the happy path.
- is related to
-
WT-17722 Add a reusable bounded wait_for_sweep test helper
-
- Open
-
-
WT-17721 Add a reusable bounded wait_for_checkpoint_start test helper
-
- Closed
-
- related to
-
WT-17700 [Layered] test_layered_checkpoint08: hang in palite config (service threads parked) causes 2h idle timeout
-
- Open
-
-
WT-17735 test/format (mode=switch) assertion in __rec_split_write
-
- Open
-
-
WT-17687 failed: unit-test-hook-parallel-checkpoint-bucket10, parallel checkpoint UT timeout
-
- Closed
-