Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT11.3.0, 7.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Epic Link:
SPM-3292
Assigned Teams:

Storage Engines
Sprint:
2023-12-12 - Heisenbug
Story Points:
3

Some of our patch builds were failing test_verify.py consistently with the following error:

[2023/11/29 04:25:34.376] [1701231595:894982][149352:0x7f25c4d98800], test_verify.test_verify.test_verify_api_corrupt_first_page, file:test_verify.a.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: __wt_block_read_off, 234: test_verify.a.wt: potential hardware corruption, read checksum error for 28672B block at offset 4096: calculated block checksum of 0xd4b774a doesn't match expected checksum of 0x5ccb8e3e

The test deliberately corrupts the data file and runs verify after.

The below patch fixed the errors we were seeing by avoiding prefetch whenever we detect a corrupted block:

--- a/src/conn/conn_prefetch.c
+++ b/src/conn/conn_prefetch.c
@@ -108,7 +108,13 @@ __wt_prefetch_thread_run(WT_SESSION_IMPL *session, WT_THREAD *thread)
         __wt_spin_unlock(session, &conn->prefetch_lock);
         locked = false;
 
-        WT_WITH_DHANDLE(session, pe->dhandle, ret = __wt_prefetch_page_in(session, pe));
+        /*
+         * It's a weird case, but if verify is utilizing prefetch and encounters a corrupted
+         * block, stop using prefetch. Some of the guarantees about ref and page freeing are
+         * ignored in that case, which can invalidate entries on the prefetch queue.
+         */
+        if (!F_ISSET(S2C(session), WT_CONN_DATA_CORRUPTION) && pe->ref->page_del != NULL)
+            WT_WITH_DHANDLE(session, pe->dhandle, ret = __wt_prefetch_page_in(session, pe));
         /*
          * It probably isn't strictly necessary to re-acquire the lock to reset the flag, but other
          * flag accesses do need to lock, so it's better to be consistent.

related to

WT-12135 Don't re-open connections and sessions with pre-fetching enabled after damaging tables

Closed

Assignee:: Clarisse Cheah
Reporter:: Monica Ng
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Dec 04 2023 03:17:39 AM UTC
Updated:: Mar 04 2024 07:11:32 AM UTC
Resolved:: Dec 06 2023 02:51:13 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates