Generalise verify read_corrupt config to all modes in wt util

XMLWordPrintableJSON

    • Type: Sub-task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Tools
    • Security Level: Public (Available to anyone on the web)
    • None
    • Storage Engines - Persistence
    • 1,113.112
    • SE Persistence backlog
    • None

      The wt verify subcommand accepts a read_corrupt option to read past corrupt pages. We should extend this to dump and other subcommands that are useful for diagnostic purposes.

      Background

      wt verify accepts a flag that passes read_corrupt=true to session>verify(). This controls two behaviours in the verify path:

      1. Btree traversal continuation (bt_vrfy.c): when __wt_page_in fails on a child ref, verify logs the error and skips the subtree instead of aborting.
      2. Block I/O suppression: the verify path sets WT_BTREE_VERIFY on the btree, which causes block_io.c and related block readers to return an error code rather than panicking on checksum/decryption failures.

      Other diagnostic commands (dump, read, stat, list, page, printlog) have no equivalent. When any of these encounters a corrupt page, the cursor iteration fails and the command aborts. This makes the wt CLI less useful for inspecting partially corrupt or disaggregated databases.

      Goal

      All read-oriented wt subcommands should be able to continue past corrupt pages, printing or skipping what they can recover. Subcommands in scope: dump, read, stat, list, page, printlog.

      Implementation Considerations

      There are two independent layers to address:

      Layer 1 — Block I/O: WT_SESSION_QUIET_CORRUPT_FILE is the session flag checked in block_read.c, block_io.c, and block_disagg_read.c to suppress panics and return an error code instead. Setting this flag on the session before running a diagnostic command prevents crashes on corrupt block reads.

      Layer 2 — Cursor iteration: Even with Layer 1 in place, a corrupt page causes cursor->next() / cursor->prev() to return an error, which current iteration loops (e.g. dump_all_records in util_dump.c) treat as fatal. Each command's iteration loop needs to distinguish between WT_NOTFOUND (end of data), WT_ERROR/EIO in read-corrupt mode (skip and continue), and other errors (abort). This mirrors the pattern in bt_vrfy.c where verify's traversal catches __wt_page_in errors and continues.

      Implementation Options

      Option A — Global CLI flag

      Add a flag to util_main.c (e.g. a new free letter at the global level). After the session is opened, if the flag is set, call F_SET((WT_SESSION_IMPL *)session, WT_SESSION_QUIET_CORRUPT_FILE) before dispatching to the subcommand. Each targeted subcommand still needs its iteration loop updated to continue past errors (Layer 2).

      • Pro: single point of change for the session flag; automatically applies to any current or future subcommand.
      • Con: -c is already a per-command flag in wt verify, so a different letter must be chosen at the global level. Does not remove the need to update each command's iteration loop.

      Option B — Per-subcommand flag (matches verify's existing -c pattern)

      Add -c to each targeted subcommand. Each command sets WT_SESSION_QUIET_CORRUPT_FILE on the session and updates its iteration loop to continue past errors.

      • Pro: consistent with wt verify -c; each command is self-contained.
      • Con: changes required in 6+ files (util_dump.c, util_read.c, util_stat.c, util_list.c, util_page.c, util_printlog.c).

      Test Plan

      • Write a Python suite test that creates a table, corrupts a page on disk, then verifies that wt dump -c, wt read -c, and wt stat -c produce partial output and exit non-zero rather than crashing or producing no output.
      • Confirm wt verify -c behaviour is unchanged.
      • Confirm that without -c, commands still abort on the first corrupt page.

            Assignee:
            Dylan Liang
            Reporter:
            Sean Watt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: