Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5972

4.4 branch creates trees 4.2 cannot verify (and may not be able to support).

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      alex.cameron@10gen.com, Alexander Gorrod, testing more with WT-5934, I found a problem.

      Currently in the 4.2 branch, we skip unstable entries after a downgrade, it's this code in bt_page.c:

      static inline bool
      __unstable_skip(WT_SESSION_IMPL *session, const WT_PAGE_HEADER *dsk, WT_CELL_UNPACK *unpack) 
      {            
          /*       
           * We should never see a prepared cell, it implies an unclean shutdown followed by a downgrade
           * (clean shutdown rolls back any prepared cells). Complain and ignore the row.
           */      
          if (F_ISSET(unpack, WT_CELL_UNPACK_PREPARE)) {
              __wt_err(session, EINVAL, "unexpected prepared cell found, ignored");
              return (true);
          }       
                  
          /*      
           * Skip unstable entries after downgrade to releases without validity windows and from previous
           * wiredtiger_open connections.
           */      
          return ((unpack->stop_ts != WT_TS_MAX || unpack->stop_txn != WT_TXN_MAX) && 
            (S2C(session)->base_write_gen > dsk->write_gen || !__wt_process.page_version_ts));
      }
      

      However, this means that if a page has nothing but unstable entries on it, it will be read in as an empty page. The 4.2 verify code (and perhaps the run-time code for all I know), isn't prepared to accept pages that have no entries on them, the verify failure looks like this:

      wt, file:wt.wt, WT_SESSION.verify: __verify_row_int_key_order, 585: vs->max_addr->size != 0
      wt, file:wt.wt, WT_SESSION.verify: __wt_abort, 28: aborting WiredTiger library
       
      Thread 1 "lt-wt" received signal SIGABRT, Aborted.
      (gdb) where
      #0  0x00007ffff668fe97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007ffff6691801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00007ffff7ac53cc in __wt_abort (session=0x7ffff7fb2ea0) at src/os_common/os_abort.c:30
      #3  0x00007ffff79e8db2 in __verify_row_int_key_order (session=0x7ffff7fb2ea0, parent=0x6e45a0, 
          ref=0x6e54c0, entry=2, vs=0x7fffffffdbb0) at src/btree/bt_vrfy.c:585
      #4  0x00007ffff79e8b23 in __verify_tree (session=0x7ffff7fb2ea0, ref=0x6e5010, 
          addr_unpack=0x7fffffffd8f0, vs=0x7fffffffdbb0) at src/btree/bt_vrfy.c:547
      #5  0x00007ffff79e8c1e in __verify_tree (session=0x7ffff7fb2ea0, ref=0x672bb8, 
          addr_unpack=0x7fffffffe250, vs=0x7fffffffdbb0) at src/btree/bt_vrfy.c:556
      #6  0x00007ffff79e7b07 in __wt_verify (session=0x7ffff7fb2ea0, cfg=0x7fffffffe4e0)
          at src/btree/bt_vrfy.c:234
      

      What's happening is the 4.2 verify code assumes there will always be a leaf page verified before an internal page is verified (because the walk is depth-first), and that page will have a key on it. That continues to be true in 4.4, but it's possible for 4.2 to read in the 4.4 leaf page any not find any entries, so there's no key and 4.2 asserts.

      I'm not seeing an obvious solution, I'm pretty sure we're not prepared to assert the 4.2 release can handle such empty pages.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-storage-engines Backlog - Storage Engines Team
              Reporter:
              keith.bostic Keith Bostic
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: