Improve diagnostics on illegal page type: validate at read time and dump page/block on failure

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Btree
    • None

      Motivation

      WT-14750 / PR #12079 added a write-side guard in __rec_write that rejects a disk image whose dsk->type is WT_PAGE_INVALID or >= WT_PAGE_TYPE_COUNT. That stops bad pages from being persisted going forward, but it does not help us diagnose failures on the read side.

      We just hit such a failure in production on mongod 8.0.23:

      __wt_btcur_next:878: encountered an illegal file format or internal value: 0x0
      __wt_btcur_next:878: the process must exit and restart   (WT_PANIC, -31804)
      dhandle: file:index-25--5608028140981202196.wt
      session: WT_CURSOR.next
      

      The panic fires from the default: arm of the switch (page->type) in bt_curnext.c:878. By the time we get there, all we know is "the type byte was 0x0". We have no block address, no checksum, no idea whether the bad image came off disk or was clobbered in memory. The customer ticket cannot be progressed without that information.

      Proposal

      Two coordinated changes:

      1. Read-side mirror of WT-14750 — validate page type at materialization

      In _wt_page_inmem (and ideally wt_verify_dsk_image / _wt_bt_read), reject any disk image with:

      if (dsk->type == WT_PAGE_INVALID || dsk->type >= WT_PAGE_TYPE_COUNT)
          WT_RET(__wt_illegal_value(session, dsk->type));
      

      This catches the invalid type byte the moment the page is read off disk, while we still have the block address cookie and checksum in scope. Today the failure surfaces later in __wt_btcur_next, by which point that provenance is gone.

      2. Dump the page header and raw block on the illegal-value path

      Mirror what __wt_bm_corrupt_dump does for checksum mismatches. When an illegal page type is detected (either from the new read-side check, or from any of the existing switch (page->type) defaults in the cursor walkers), emit a diagnostic block that logs:

      • dhandle name
      • ref->addr — block offset, size, and stored checksum
      • The raw on-disk block, re-read from that address
      • WT_PAGE_HEADER of the in-memory image: recno, write_gen, mem_size, oflags, type, version, u.entries
      • page->memory_footprint, page->modify state (clean vs. dirty), and whether the page was just built or read from disk
      • First 256 bytes hex of page->dsk

      The key forensic question this answers in one log line: was the corruption on disk (the re-read matches the bad image) or in memory (the re-read is fine)? That determines whether we suspect persistence, cache corruption / UAF, or hardware bitflip.

      Definition of done

      • Read-side type validation in place, with a csuite test that injects a bad type byte into a written page and verifies the panic now fires from _wt_page_inmem with full context, not from _wt_btcur_next.
      • Diagnostic dump emitted on the illegal-value path, including the re-read of the on-disk block.
      • Log output reviewed to confirm it includes enough information to distinguish on-disk vs. in-memory corruption without further customer interaction.

      References

      • WT-14750 / PR #12079 — write-side type check (this is the read-side complement).
      • In-progress investigation: SE Persistence triage of a WT_PANIC from __wt_btcur_next:878 on mongod 8.0.23 (cluster trx-sharded-shard-03-01-wueoy.mongodb.net, 2026-05-25).

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: