Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Btree
Labels:
None

Assigned Teams:

Storage Engines - Persistence
Total Hours with Assigned Team:
1,609.583
Epic Link:
Investigation of spurious errors related to the block manager
Sprint:
SE Persistence backlog
Story Points:
None

Motivation

~~WT-14750~~ / PR #12079 added a write-side guard in __rec_write that rejects a disk image whose dsk->type is WT_PAGE_INVALID or >= WT_PAGE_TYPE_COUNT. That stops bad pages from being persisted going forward, but it does not help us diagnose failures on the read side.

We just hit such a failure in production on mongod 8.0.23:

__wt_btcur_next:878: encountered an illegal file format or internal value: 0x0
__wt_btcur_next:878: the process must exit and restart   (WT_PANIC, -31804)
dhandle: file:index-25--5608028140981202196.wt
session: WT_CURSOR.next

The panic fires from the default: arm of the switch (page->type) in bt_curnext.c:878. By the time we get there, all we know is "the type byte was 0x0". We have no block address, no checksum, no idea whether the bad image came off disk or was clobbered in memory. The customer ticket cannot be progressed without that information.

Proposal

Two coordinated changes:

1. Read-side mirror of WT-14750 — validate page type at materialization

In _wt_page_inmem (and ideally wt_verify_dsk_image / _wt_bt_read), reject any disk image with:

if (dsk->type == WT_PAGE_INVALID || dsk->type >= WT_PAGE_TYPE_COUNT)
    WT_RET(__wt_illegal_value(session, dsk->type));

This catches the invalid type byte the moment the page is read off disk, while we still have the block address cookie and checksum in scope. Today the failure surfaces later in __wt_btcur_next, by which point that provenance is gone.

2. Dump the page header and raw block on the illegal-value path

Mirror what __wt_bm_corrupt_dump does for checksum mismatches. When an illegal page type is detected (either from the new read-side check, or from any of the existing switch (page->type) defaults in the cursor walkers), emit a diagnostic block that logs:

dhandle name
ref->addr — block offset, size, and stored checksum
The raw on-disk block, re-read from that address
WT_PAGE_HEADER of the in-memory image: recno, write_gen, mem_size, oflags, type, version, u.entries
page->memory_footprint, page->modify state (clean vs. dirty), and whether the page was just built or read from disk
First 256 bytes hex of page->dsk

The key forensic question this answers in one log line: was the corruption on disk (the re-read matches the bad image) or in memory (the re-read is fine)? That determines whether we suspect persistence, cache corruption / UAF, or hardware bitflip.

Definition of done

Read-side type validation in place, with a csuite test that injects a bad type byte into a written page and verifies the panic now fires from _wt_page_inmem with full context, not from _wt_btcur_next.
Diagnostic dump emitted on the illegal-value path, including the re-read of the on-disk block.
Log output reviewed to confirm it includes enough information to distinguish on-disk vs. in-memory corruption without further customer interaction.

References

~~WT-14750~~ / PR #12079 — write-side type check (this is the read-side complement).
In-progress investigation: SE Persistence triage of a WT_PANIC from __wt_btcur_next:878 on mongod 8.0.23 (cluster trx-sharded-shard-03-01-wueoy.mongodb.net, 2026-05-25).

is related to

WT-14750 Failure in __wt_page_inmem: "encountered an illegal file format or internal value: 0x0"

Closed

WT-17081 Dump the page image if decoding the page contents fails

Closed

related to

WT-17659 Rename numbered layered/disagg Python tests to descriptive names — Pt 3: Schema/config

Closed

WT-17661 Validate page->type at every cursor/search switch via shared __wt_page_type_valid helper

Open

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Etienne Petrel
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: May 26 2026 09:13:16 PM UTC
Updated:: May 29 2026 12:28:18 AM UTC

Details

Description

Motivation

Proposal

1. Read-side mirror of WT-14750 — validate page type at materialization

2. Dump the page header and raw block on the illegal-value path

Definition of done

References

Attachments

Issue Links

Activity

People

Dates