-
Type: Bug
-
Resolution: Duplicate
-
Priority: Blocker - P1
-
None
-
Affects Version/s: 4.2.7
-
Component/s: None
-
None
-
v4.4, v4.2
Running the incremental backup csuite test, test/csuite/incr_backup, produces occasional checksum mismatches running at the head of the WT 4.2 branch.
The problem occurs when the test reads a backup and gets a block of zeros instead of the data it expected. This suggests that a required block didn't get copied to the backup.
$ ./test_incr_backup -S 2142420118438890895
Seed: 2142420118438890895
[1591384524:413303][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_block_read_off, 283: t21-1.wt: read checksum error for 28672B block at offset 1093632: block header checksum of 0 doesn't match expected checksum of 0x48c0a49b
[1591384524:413426][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_bm_corrupt_dump, 135: {1093632, 28672, 0x48c0a49b}: (chunk 1 of 28): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
The error occurs while replaying the log when opening the backup:
(gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fa935906801 in __GI_abort () at abort.c:79 #2 0x0000000000467b73 in __wt_abort (session=session@entry=0x2f5fd00) at ../src/os_common/os_abort.c:30 #3 0x0000000000404716 in __wt_panic (session=session@entry=0x2f5fd00) at ../src/support/err.c:497 #4 0x00000000005a3eeb in __wt_block_read_off (session=session@entry=0x2f5fd00, block=block@entry=0x394ea10, buf=buf@entry=0x7fff073eb9e0, offset=1093632, size=28672, checksum=1220584603) at ../src/block/block_read.c:292 #5 0x00000000005a402a in __wt_bm_read (bm=0x394e8e0, session=0x2f5fd00, buf=0x7fff073eb9e0, addr=<optimized out>, addr_size=<optimized out>) at ../src/block/block_read.c:99 #6 0x00000000004ef8f3 in __wt_bt_read (session=session@entry=0x2f5fd00, buf=buf@entry=0x7fff073eb9e0, addr=0x3956287 "\300ʇ\344H\300\204[!key-48-60\210\300ц\344uO\326\070!key-50-10\210\300\036\207\344\005\376-\225%key-53-100\210\300%\207\344\026\236S\331!key-56-40\210\300,\206\344\261\355[Y\031key-590\210\300E\207\344\315q\302\234%key-61-100\210\300L\204\344\022\364\034N\031key-630\210\300P\206\344L\v\274\374!key-65-20\210\300V\207\344\004\033?U!key-68-50\210\300]\207\344P\262D\267!key-71-20\210\300d\205\344\207\027\250a!key"..., addr_size=8) at ../src/btree/bt_io.c:38 #7 0x00000000004f8360 in __page_read (session=session@entry=0x2f5fd00, ref=ref@entry=0x3957d80, flags=flags@entry=1024) at ../src/btree/bt_read.c:480 #8 0x00000000004f96ee in __wt_page_in_func (session=session@entry=0x2f5fd00, ref=ref@entry=0x3957d80, flags=flags@entry=1024, func=func@entry=0x5ddd50 <__func__.41945> "__wt_row_search", line=line@entry=425) at ../src/btree/bt_read.c:640 #9 0x0000000000529a45 in __wt_page_swap_func (func=<synthetic pointer>, line=425, line=425, func=<synthetic pointer>, flags=1024, want=0x3957d80, held=0x394dc78, session=0x2f5fd00) at ../src/include/btree.i:1631 #10 __wt_row_search (cbt=cbt@entry=0x3958bf0, srch_key=srch_key@entry=0x3958cf0, insert=insert@entry=true, leaf=leaf@entry=0x0, leaf_safe=leaf_safe@entry=false, leaf_foundp=leaf_foundp@entry=0x0) at ../src/btree/row_srch.c:425 #11 0x00000000004e2468 in __cursor_row_search (leaf_foundp=0x0, leaf=0x0, insert=true, cbt=0x3958bf0) at ../src/btree/bt_cursor.c:392 #12 __wt_btcur_insert (cbt=cbt@entry=0x3958bf0) at ../src/btree/bt_cursor.c:855 #13 0x0000000000428c97 in __curfile_insert (cursor=0x3958bf0) at ../src/cursor/cur_file.c:265 #14 0x00000000004be51f in __txn_op_apply (r=r@entry=0x7fff073ec0c0, lsnp=lsnp@entry=0x7fff073ebf90, pp=pp@entry=0x7fff073ebe88, end=<optimized out>, end@entry=0x4a360d0 'a' <repeats 200 times>...) at ../src/txn/txn_recover.c:158 #15 0x00000000004beebe in __txn_commit_apply (end=<optimized out>, pp=<optimized out>, lsnp=<optimized out>, r=<optimized out>) at ../src/txn/txn_recover.c:300 #16 __txn_log_recover (session=session@entry=0x2f5fd00, logrec=<optimized out>, lsnp=lsnp@entry=0x7fff073ebf90, next_lsnp=next_lsnp@entry=0x7fff073ebf70, cookie=cookie@entry=0x7fff073ec0c0, firstrecord=firstrecord@entry=0) at ../src/txn/txn_recover.c:345 #17 0x0000000000451300 in __wt_log_scan (session=0x2f5fd00, lsnp=lsnp@entry=0x0, flags=flags@entry=9, func=func@entry=0x4bedd0 <__txn_log_recover>, cookie=cookie@entry=0x7fff073ec0c0) at ../src/log/log.c:2363 #18 0x00000000004bfe3f in __wt_txn_recover (session=0x2f5fd00, session@entry=0x2f5f360) at ../src/txn/txn_recover.c:676 #19 0x000000000041c587 in __wt_connection_workers (session=session@entry=0x2f5f360, cfg=cfg@entry=0x7fff073ec310) at ../src/conn/conn_open.c:213 #20 0x00000000004175b7 in wiredtiger_open (home=<optimized out>, home@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", event_handler=event_handler@entry=0x0, config=config@entry=0x0, connectionp=connectionp@entry=0x7fff073ec4d0) at ../src/conn/conn_api.c:2680 #21 0x000000000040a566 in check_backup ( backup_home=backup_home@entry=0x7fff073ee2e0 "WT_TEST.incr_backup.BACKUP", backup_check=backup_check@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", tinfo=0x7fff073ed6c0, tinfo=0x7fff073ed6c0) at ../../../test/csuite/incr_backup/main.c:708 #22 0x0000000000409bd9 in main (argc=<optimized out>, argv=<optimized out>) at ../../../test/csuite/incr_backup/main.c:857
This isn't too hard to repeat. I ran test_incr_backup 50 times and hit three failures. The other problem seeds were:
./test_incr_backup -S 5169869354081646233 ./test_incr_backup -S 5675251266792091845
This seems to have been introduced in the backports of WT-6136 and WT-6137. At the previous commit (the backport of WT-5242 and friends) I don't see the problem.
I also don't see this problem (in 200+ runs) in develop or the 4.4 branch, so I expect that something went wrong with the backport to 4.2.
- duplicates
-
WT-6215 Clear backup block information on rename: Reverted
- Closed