There is a corner case that causes the wt verify -c command to fail.
On a corrupt file received from a customer, the command reports a checksum mismatch, then exits with the following errors:
[1676317374:653562][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Assertion 'vs->max_addr->size != 0' failed: [1676317374:653568][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: int __verify_row_int_key_order(WT_SESSION_IMPL *, WT_PAGE *, WT_REF *, uint32_t, WT_VSTUFF *), 688: Expression returned false [1676317374:653573][32684:0x7ff0f6f41440], wt, file:collection-125--3534195884026267665.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: void __wt_abort(WT_SESSION_IMPL *), 28: aborting WiredTiger library
The error happens in __verify_tree() when processing an internal row store page. There is a loop that iterates over the children doing the following (pseudo-code):
foreach child: if not first child: __verify_row_int_key_order() load the child page if error: continue verify child's subtree
In this case the first child page is the one that is corrupted. So the first iteration through the loop skips the "verify child's subtree" piece and restarts the loop by calling __verify_row_int_key_order() on the next page. Apparently there is some state this function requires that is set up during the verification of the previous child's subtree. In this case it is missing or incorrect and we fail.
As a quick hack I was able to fix this by skipping __verify_row_int_key_order() when the first child page is corrupt. But we need to understand what is going on here better to come up with the proper fix.
- is caused by
-
WT-9821 Add option to verify to report all data corruption in a file
- Closed