Disaggregated block read corruption logging ignores read_corrupt/verify modes

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Block Manager
    • Storage Engines - Persistence
    • 128.311
    • SE Persistence backlog
    • None

      Background:
      WT-17348 generalised verify read_corrupt to the wt utility, switching the disaggregated read corrupt-handler panic decision from WT_SESSION_QUIET_CORRUPT_FILE to WT_SESSION_READ_CORRUPT_OK(session). The surrounding error-logging suppression in __block_disagg_read_multiple (src/block_disagg/block_disagg_read.c) was not updated and still keys off WT_SESSION_QUIET_CORRUPT_FILE only.

      Problem:
      The _block_disagg_read_err() calls (checksum/header mismatch) and the _wt_log_data_dump() "corrupt dump" are guarded by !F_ISSET(session, WT_SESSION_QUIET_CORRUPT_FILE). They do not honor WT_SESSION_READ_SKIP_CORRUPT (the wt -q mode) nor the verify handle. As a result:

      • During WT_SESSION::verify, the disagg block manager emits per-block "checksum doesn't match" and "corrupt dump" messages, whereas the file-based block managers (block/block_read.c) stay silent during verify (they return before dumping when block->verify is set).
      • In wt -q / read_corrupt mode, the per-block corruption messages are still printed, contrary to the "continue quietly, produce partial output" intent of the flag.

      This is lower severity than the panic gap (noise, not correctness), but it is the same blind spot to the verify/read-corrupt mode in the surrounding code.

      Tasks:

      • Align the logging-suppression guards in __block_disagg_read_multiple with the panic guard: suppress when WT_SESSION_READ_CORRUPT_OK(session) or the btree is in verify (F_ISSET(S2BT(session), WT_BTREE_VERIFY)).
      • Confirm consistency with the file-based block managers (block_read.c, block_io.c).

      Definition of Done:

      • Disagg verify and wt -q reads no longer emit per-block corruption log spam, matching the file-based block managers.
      • Existing tests pass (test_verify_disagg*, test_util_read_corrupt); test_verify_disagg04 no longer needs to ignore "corrupt dump" / "read checksum error" stderr (or the ignores are intentionally retained with justification).

      References: src/block_disagg/block_disagg_read.c:281,286,293; introduced/missed in WT-17348 (commit 8236e64).

            Assignee:
            Dylan Liang
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: