-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: PALite
-
None
-
Storage Engines - Foundations, Storage Engines - Persistence
-
SE Persistence backlog
-
None
Issue Summary
Unit tests are failing due to the PALITE assertion error: "Multiple discarded pages in chain". Investigation revealed that the error occurs when running shutdown checkpoints, specifically involving root pages. The assertion was introduced in WT-15693 and is now being triggered in multiple places, including recent PRs like WT-16252.
Context
- The error log shows:
Multiple discarded pages in chain: {table_id=25, page_id=101, lsn=8, backlink_lsn=2, base_lsn=2, flags=0x10002}, {table_id=25, page_id=101, lsn=16, backlink_lsn=2, base_lsn=2, flags=0x10002} - The leaked page is confirmed to be a root page, matching logs from checkpoint operations:
[1773104026:079037][59025:0x16b7ab000], file:test_prepare_discover08.wt_stable, disagg-drain: [WT_VERB_DISAGGREGATED_STORAGE][DEBUG_1]: Loading checkpoint: root_id=101 flags=0 lsn=2 base_lsn=0 root_size=64 root_checksum=f65c4467
- The issue appears during shutdown checkpoint in disaggregated storage mode. There is disagreement on whether shutdown checkpoints should be run in this mode, but skipping them does not explain why the root page is freed twice.
- No existing Jira ticket tracks this issue, and it is appearing in multiple recent tests and PRs.
Proposed Solution
- Investigate why root pages are being freed twice during shutdown checkpoint.
- Determine if shutdown checkpoints should be skipped in disaggregated storage mode, and document the rationale.
- Provide a workaround for affected tests, if possible.
- Track this issue with a Jira ticket and coordinate with the persistence team for resolution.
Original Slack thread: Slack Thread
This ticket was generated by AI from a Slack thread.