Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6385

test_incr_backup failure on 4.2 branch

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Blocker - P1 Blocker - P1
    • None
    • Affects Version/s: 4.2.7
    • Component/s: None
    • Labels:
      None
    • v4.4, v4.2

      Running the incremental backup csuite test, test/csuite/incr_backup, produces occasional checksum mismatches running at the head of the WT 4.2 branch.  

      The problem occurs when the test reads a backup and gets a block of zeros instead of the data it expected.  This suggests that a required block didn't get copied to the backup.

       

      $ ./test_incr_backup -S 2142420118438890895
      Seed: 2142420118438890895
      [1591384524:413303][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_block_read_off, 283: t21-1.wt: read checksum error for 28672B block at offset 1093632: block header checksum of 0 doesn't match expected checksum of 0x48c0a49b
      [1591384524:413426][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_bm_corrupt_dump, 135: {1093632, 28672, 0x48c0a49b}: (chunk 1 of 28): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
      

      The error occurs while replaying the log when opening the backup:

      (gdb) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #1  0x00007fa935906801 in __GI_abort () at abort.c:79
      #2  0x0000000000467b73 in __wt_abort (session=session@entry=0x2f5fd00)
          at ../src/os_common/os_abort.c:30
      #3  0x0000000000404716 in __wt_panic (session=session@entry=0x2f5fd00) at ../src/support/err.c:497
      #4  0x00000000005a3eeb in __wt_block_read_off (session=session@entry=0x2f5fd00, 
          block=block@entry=0x394ea10, buf=buf@entry=0x7fff073eb9e0, offset=1093632, size=28672, 
          checksum=1220584603) at ../src/block/block_read.c:292
      #5  0x00000000005a402a in __wt_bm_read (bm=0x394e8e0, session=0x2f5fd00, buf=0x7fff073eb9e0, 
          addr=<optimized out>, addr_size=<optimized out>) at ../src/block/block_read.c:99
      #6  0x00000000004ef8f3 in __wt_bt_read (session=session@entry=0x2f5fd00, 
          buf=buf@entry=0x7fff073eb9e0, 
          addr=0x3956287 "\300ʇ\344H\300\204[!key-48-60\210\300ц\344uO\326\070!key-50-10\210\300\036\207\344\005\376-\225%key-53-100\210\300%\207\344\026\236S\331!key-56-40\210\300,\206\344\261\355[Y\031key-590\210\300E\207\344\315q\302\234%key-61-100\210\300L\204\344\022\364\034N\031key-630\210\300P\206\344L\v\274\374!key-65-20\210\300V\207\344\004\033?U!key-68-50\210\300]\207\344P\262D\267!key-71-20\210\300d\205\344\207\027\250a!key"..., addr_size=8) at ../src/btree/bt_io.c:38
      #7  0x00000000004f8360 in __page_read (session=session@entry=0x2f5fd00, ref=ref@entry=0x3957d80, 
          flags=flags@entry=1024) at ../src/btree/bt_read.c:480
      #8  0x00000000004f96ee in __wt_page_in_func (session=session@entry=0x2f5fd00, 
          ref=ref@entry=0x3957d80, flags=flags@entry=1024, 
          func=func@entry=0x5ddd50 <__func__.41945> "__wt_row_search", line=line@entry=425)
          at ../src/btree/bt_read.c:640
      #9  0x0000000000529a45 in __wt_page_swap_func (func=<synthetic pointer>, line=425, line=425, 
          func=<synthetic pointer>, flags=1024, want=0x3957d80, held=0x394dc78, session=0x2f5fd00)
          at ../src/include/btree.i:1631
      #10 __wt_row_search (cbt=cbt@entry=0x3958bf0, srch_key=srch_key@entry=0x3958cf0, 
          insert=insert@entry=true, leaf=leaf@entry=0x0, leaf_safe=leaf_safe@entry=false, 
          leaf_foundp=leaf_foundp@entry=0x0) at ../src/btree/row_srch.c:425
      #11 0x00000000004e2468 in __cursor_row_search (leaf_foundp=0x0, leaf=0x0, insert=true, 
          cbt=0x3958bf0) at ../src/btree/bt_cursor.c:392
      #12 __wt_btcur_insert (cbt=cbt@entry=0x3958bf0) at ../src/btree/bt_cursor.c:855
      #13 0x0000000000428c97 in __curfile_insert (cursor=0x3958bf0) at ../src/cursor/cur_file.c:265
      #14 0x00000000004be51f in __txn_op_apply (r=r@entry=0x7fff073ec0c0, 
          lsnp=lsnp@entry=0x7fff073ebf90, pp=pp@entry=0x7fff073ebe88, end=<optimized out>, 
          end@entry=0x4a360d0 'a' <repeats 200 times>...) at ../src/txn/txn_recover.c:158
      #15 0x00000000004beebe in __txn_commit_apply (end=<optimized out>, pp=<optimized out>, 
          lsnp=<optimized out>, r=<optimized out>) at ../src/txn/txn_recover.c:300
      #16 __txn_log_recover (session=session@entry=0x2f5fd00, logrec=<optimized out>, 
          lsnp=lsnp@entry=0x7fff073ebf90, next_lsnp=next_lsnp@entry=0x7fff073ebf70, 
          cookie=cookie@entry=0x7fff073ec0c0, firstrecord=firstrecord@entry=0)
          at ../src/txn/txn_recover.c:345
      #17 0x0000000000451300 in __wt_log_scan (session=0x2f5fd00, lsnp=lsnp@entry=0x0, 
          flags=flags@entry=9, func=func@entry=0x4bedd0 <__txn_log_recover>, 
          cookie=cookie@entry=0x7fff073ec0c0) at ../src/log/log.c:2363
      #18 0x00000000004bfe3f in __wt_txn_recover (session=0x2f5fd00, session@entry=0x2f5f360)
          at ../src/txn/txn_recover.c:676
      #19 0x000000000041c587 in __wt_connection_workers (session=session@entry=0x2f5f360, 
          cfg=cfg@entry=0x7fff073ec310) at ../src/conn/conn_open.c:213
      #20 0x00000000004175b7 in wiredtiger_open (home=<optimized out>, 
          home@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", event_handler=event_handler@entry=0x0, 
          config=config@entry=0x0, connectionp=connectionp@entry=0x7fff073ec4d0)
          at ../src/conn/conn_api.c:2680
      #21 0x000000000040a566 in check_backup (
          backup_home=backup_home@entry=0x7fff073ee2e0 "WT_TEST.incr_backup.BACKUP", 
          backup_check=backup_check@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", 
          tinfo=0x7fff073ed6c0, tinfo=0x7fff073ed6c0) at ../../../test/csuite/incr_backup/main.c:708
      #22 0x0000000000409bd9 in main (argc=<optimized out>, argv=<optimized out>)
          at ../../../test/csuite/incr_backup/main.c:857
      

      This isn't too hard to repeat.  I ran test_incr_backup 50 times and hit three failures.  The other problem seeds were:

          ./test_incr_backup -S 5169869354081646233
          ./test_incr_backup -S 5675251266792091845
      

      This seems to have been introduced in the backports of WT-6136 and WT-6137.  At the previous commit (the backport of WT-5242 and friends) I don't see the problem.  

      I also don't see this problem (in 200+ runs) in develop or the 4.4 branch, so I expect that something went wrong with the backport to 4.2.

       

            Assignee:
            keith.smith@mongodb.com Keith Smith
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: