Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12072

Disable pre-fetching for corrupted pages

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT11.3.0, 7.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage Engines
    • 3
    • 2023-12-12 - Heisenbug

      Some of our patch builds were failing test_verify.py consistently with the following error:

      [2023/11/29 04:25:34.376] [1701231595:894982][149352:0x7f25c4d98800], test_verify.test_verify.test_verify_api_corrupt_first_page, file:test_verify.a.wt, WT_SESSION.verify: [WT_VERB_DEFAULT][ERROR]: __wt_block_read_off, 234: test_verify.a.wt: potential hardware corruption, read checksum error for 28672B block at offset 4096: calculated block checksum of 0xd4b774a doesn't match expected checksum of 0x5ccb8e3e
      

      The test deliberately corrupts the data file and runs verify after.

      The below patch fixed the errors we were seeing by avoiding prefetch whenever we detect a corrupted block:

      --- a/src/conn/conn_prefetch.c
      +++ b/src/conn/conn_prefetch.c
      @@ -108,7 +108,13 @@ __wt_prefetch_thread_run(WT_SESSION_IMPL *session, WT_THREAD *thread)
               __wt_spin_unlock(session, &conn->prefetch_lock);
               locked = false;
       
      -        WT_WITH_DHANDLE(session, pe->dhandle, ret = __wt_prefetch_page_in(session, pe));
      +        /*
      +         * It's a weird case, but if verify is utilizing prefetch and encounters a corrupted
      +         * block, stop using prefetch. Some of the guarantees about ref and page freeing are
      +         * ignored in that case, which can invalidate entries on the prefetch queue.
      +         */
      +        if (!F_ISSET(S2C(session), WT_CONN_DATA_CORRUPTION) && pe->ref->page_del != NULL)
      +            WT_WITH_DHANDLE(session, pe->dhandle, ret = __wt_prefetch_page_in(session, pe));
               /*
                * It probably isn't strictly necessary to re-acquire the lock to reset the flag, but other
                * flag accesses do need to lock, so it's better to be consistent.
      

            Assignee:
            clarisse.cheah@mongodb.com Clarisse Cheah
            Reporter:
            monica.ng@mongodb.com Monica Ng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: