Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: WT11.2.0, 7.1.0-rc0, 7.0.0-rc7, 6.0.12, 5.0.24
Affects Version/s: None
Component/s: None
Labels:
- supportability

Sprint:
2023-05-30 - 7.0 Readiness, StorEng - 2023-06-13
Story Points:
5

Backport Requested:

v7.0, v6.0, v5.0

There is a corner case that causes the wt verify -c command to fail.

On a corrupt file received from a customer, the command reports a checksum mismatch, then exits with the following errors:

[1676317374:653562][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Assertion 'vs->max_addr->size != 0' failed: 
[1676317374:653568][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Expression returned false
[1676317374:653573][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: void __wt_abort(WT_SESSION_IMPL *), 28: aborting WiredTiger library

The error happens in __verify_tree() when processing an internal row store page. There is a loop that iterates over the children doing the following (pseudo-code):

foreach child:
    if not first child:
        __verify_row_int_key_order()
    load the child page
    if error:
        continue
    verify child's subtree

In this case the first child page is the one that is corrupted. So the first iteration through the loop skips the "verify child's subtree" piece and restarts the loop by calling __verify_row_int_key_order() on the next page. Apparently there is some state this function requires that is set up during the verification of the previous child's subtree. In this case it is missing or incorrect and we fail.

As a quick hack I was able to fix this by skipping __verify_row_int_key_order() when the first child page is corrupt. But we need to understand what is going on here better to come up with the proper fix.

is caused by

WT-9821 Add option to verify to report all data corruption in a file

Closed

Assignee:: Jie Chen
Reporter:: Keith Smith
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Feb 13 2023 07:57:47 PM UTC
Updated:: Dec 28 2023 08:45:54 AM UTC
Resolved:: Jun 06 2023 02:22:38 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates