Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-10601

Fix wt verify -c failure when first block on page is corrupt

    • 5
    • 2023-05-30 - 7.0 Readiness, StorEng - 2023-06-13
    • v7.0, v6.0, v5.0

      There is a corner case that causes the wt verify -c command to fail.

      On a corrupt file received from a customer, the command reports a checksum mismatch, then exits with the following errors:

      [1676317374:653562][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Assertion 'vs->max_addr->size != 0' failed: 
      [1676317374:653568][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Expression returned false
      [1676317374:653573][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: void __wt_abort(WT_SESSION_IMPL *), 28: aborting WiredTiger library

      The error happens in __verify_tree() when processing an internal row store page. There is a loop  that iterates over the children doing the following (pseudo-code):

      foreach child:
          if not first child:
              __verify_row_int_key_order()
          load the child page
          if error:
              continue
          verify child's subtree

      In this case the first child page is the one that is corrupted. So the first iteration through the loop skips the "verify child's subtree" piece and restarts the loop by calling __verify_row_int_key_order() on the next page. Apparently there is some state this function requires that is set up during the verification of the previous child's subtree. In this case it is missing or incorrect and we fail.

      As a quick hack I was able to fix this by skipping __verify_row_int_key_order() when the first child page is corrupt. But we need to understand what is going on here better to come up with the proper fix.

            Assignee:
            jie.chen@mongodb.com Jie Chen
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: