-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Test Format
-
None
-
Storage Engines - Foundations
-
90.104
-
None
-
None
Issue Summary
A hs verify failure occurs during wiredtiger_open in follower mode when running the disagg test format switch test. The failure is triggered because the shared history store is being verified without a checkpoint having been picked up, resulting in inconsistent global timestamp and disaggregated storage values (e.g., last_checkpoint_timestamp is WT_TS_NONE).
Context
- The test/format does not delete local files on startup, retaining local tables and metadata from previous runs.
- The global timestamps in txn_global appear to be set from the local metadata table and turtle file, not from an actual checkpoint.
- Despite not picking up a checkpoint, the shared history store is accessible due to retained local metadata.
- The typical MongoDB sequence is to pick up a checkpoint when starting as a follower, but test/format does not do this automatically.
- This leads to the ability to access shared data without a checkpoint, which is fundamentally incorrect.
- Workarounds such as gating operations on last_checkpoint_meta_lsn != WT_DISAGG_LSN_NONE or passing checkpoint_meta config have been suggested, but the underlying issue remains.
- The conversation suggests that the test should either pick up a checkpoint automatically in wiredtiger_open or gate/abort if a checkpoint has not been picked up.
Proposed Solution
- Modify wiredtiger_open in test/format to automatically pick up a checkpoint if local metadata is retained and no checkpoint has been picked up.
- Alternatively, gate operations on the presence of a valid checkpoint (e.g., last_checkpoint_meta_lsn != WT_DISAGG_LSN_NONE) and abort/crash if this condition is not met to prevent accessing shared data without a checkpoint.
- Review and update test/format startup sequence to ensure correct checkpoint handling and prevent inconsistent state.
Original Slack thread
This ticket was generated by AI from a Slack thread.
- is related to
-
WT-17418 Consider supporting local tables on disaggregated storage connections
-
- Backlog
-