Details
-
Bug
-
Status: Closed
-
Blocker - P1
-
Resolution: Duplicate
-
4.2.7
-
None
-
None
-
None
-
v4.4, v4.2
Description
Running the incremental backup csuite test, test/csuite/incr_backup, produces occasional checksum mismatches running at the head of the WT 4.2 branch.
The problem occurs when the test reads a backup and gets a block of zeros instead of the data it expected. This suggests that a required block didn't get copied to the backup.
$ ./test_incr_backup -S 2142420118438890895 |
Seed: 2142420118438890895 |
[1591384524:413303][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_block_read_off, 283: t21-1.wt: read checksum error for 28672B block at offset 1093632: block header checksum of 0 doesn't match expected checksum of 0x48c0a49b |
[1591384524:413426][3184:0x7fa936dab740], file:t21-1.wt, WT_CURSOR.insert: __wt_bm_corrupt_dump, 135: {1093632, 28672, 0x48c0a49b}: (chunk 1 of 28): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... |
The error occurs while replaying the log when opening the backup:
(gdb) bt
|
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 |
#1 0x00007fa935906801 in __GI_abort () at abort.c:79 |
#2 0x0000000000467b73 in __wt_abort (session=session@entry=0x2f5fd00) |
at ../src/os_common/os_abort.c:30 |
#3 0x0000000000404716 in __wt_panic (session=session@entry=0x2f5fd00) at ../src/support/err.c:497 |
#4 0x00000000005a3eeb in __wt_block_read_off (session=session@entry=0x2f5fd00, |
block=block@entry=0x394ea10, buf=buf@entry=0x7fff073eb9e0, offset=1093632, size=28672, |
checksum=1220584603) at ../src/block/block_read.c:292 |
#5 0x00000000005a402a in __wt_bm_read (bm=0x394e8e0, session=0x2f5fd00, buf=0x7fff073eb9e0, |
addr=<optimized out>, addr_size=<optimized out>) at ../src/block/block_read.c:99 |
#6 0x00000000004ef8f3 in __wt_bt_read (session=session@entry=0x2f5fd00, |
buf=buf@entry=0x7fff073eb9e0, |
addr=0x3956287 "\300ʇ\344H\300\204[!key-48-60\210\300ц\344uO\326\070!key-50-10\210\300\036\207\344\005\376-\225%key-53-100\210\300%\207\344\026\236S\331!key-56-40\210\300,\206\344\261\355[Y\031key-590\210\300E\207\344\315q\302\234%key-61-100\210\300L\204\344\022\364\034N\031key-630\210\300P\206\344L\v\274\374!key-65-20\210\300V\207\344\004\033?U!key-68-50\210\300]\207\344P\262D\267!key-71-20\210\300d\205\344\207\027\250a!key"..., addr_size=8) at ../src/btree/bt_io.c:38 |
#7 0x00000000004f8360 in __page_read (session=session@entry=0x2f5fd00, ref=ref@entry=0x3957d80, |
flags=flags@entry=1024) at ../src/btree/bt_read.c:480 |
#8 0x00000000004f96ee in __wt_page_in_func (session=session@entry=0x2f5fd00, |
ref=ref@entry=0x3957d80, flags=flags@entry=1024, |
func=func@entry=0x5ddd50 <__func__.41945> "__wt_row_search", line=line@entry=425) |
at ../src/btree/bt_read.c:640 |
#9 0x0000000000529a45 in __wt_page_swap_func (func=<synthetic pointer>, line=425, line=425, |
func=<synthetic pointer>, flags=1024, want=0x3957d80, held=0x394dc78, session=0x2f5fd00) |
at ../src/include/btree.i:1631 |
#10 __wt_row_search (cbt=cbt@entry=0x3958bf0, srch_key=srch_key@entry=0x3958cf0, |
insert=insert@entry=true, leaf=leaf@entry=0x0, leaf_safe=leaf_safe@entry=false, |
leaf_foundp=leaf_foundp@entry=0x0) at ../src/btree/row_srch.c:425 |
#11 0x00000000004e2468 in __cursor_row_search (leaf_foundp=0x0, leaf=0x0, insert=true, |
cbt=0x3958bf0) at ../src/btree/bt_cursor.c:392 |
#12 __wt_btcur_insert (cbt=cbt@entry=0x3958bf0) at ../src/btree/bt_cursor.c:855 |
#13 0x0000000000428c97 in __curfile_insert (cursor=0x3958bf0) at ../src/cursor/cur_file.c:265 |
#14 0x00000000004be51f in __txn_op_apply (r=r@entry=0x7fff073ec0c0, |
lsnp=lsnp@entry=0x7fff073ebf90, pp=pp@entry=0x7fff073ebe88, end=<optimized out>, |
end@entry=0x4a360d0 'a' <repeats 200 times>...) at ../src/txn/txn_recover.c:158 |
#15 0x00000000004beebe in __txn_commit_apply (end=<optimized out>, pp=<optimized out>, |
lsnp=<optimized out>, r=<optimized out>) at ../src/txn/txn_recover.c:300 |
#16 __txn_log_recover (session=session@entry=0x2f5fd00, logrec=<optimized out>, |
lsnp=lsnp@entry=0x7fff073ebf90, next_lsnp=next_lsnp@entry=0x7fff073ebf70, |
cookie=cookie@entry=0x7fff073ec0c0, firstrecord=firstrecord@entry=0) |
at ../src/txn/txn_recover.c:345 |
#17 0x0000000000451300 in __wt_log_scan (session=0x2f5fd00, lsnp=lsnp@entry=0x0, |
flags=flags@entry=9, func=func@entry=0x4bedd0 <__txn_log_recover>, |
cookie=cookie@entry=0x7fff073ec0c0) at ../src/log/log.c:2363 |
#18 0x00000000004bfe3f in __wt_txn_recover (session=0x2f5fd00, session@entry=0x2f5f360) |
at ../src/txn/txn_recover.c:676 |
#19 0x000000000041c587 in __wt_connection_workers (session=session@entry=0x2f5f360, |
cfg=cfg@entry=0x7fff073ec310) at ../src/conn/conn_open.c:213 |
#20 0x00000000004175b7 in wiredtiger_open (home=<optimized out>, |
home@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", event_handler=event_handler@entry=0x0, |
config=config@entry=0x0, connectionp=connectionp@entry=0x7fff073ec4d0) |
at ../src/conn/conn_api.c:2680 |
#21 0x000000000040a566 in check_backup ( |
backup_home=backup_home@entry=0x7fff073ee2e0 "WT_TEST.incr_backup.BACKUP", |
backup_check=backup_check@entry=0x7fff073edee0 "WT_TEST.incr_backup.CHECK", |
tinfo=0x7fff073ed6c0, tinfo=0x7fff073ed6c0) at ../../../test/csuite/incr_backup/main.c:708 |
#22 0x0000000000409bd9 in main (argc=<optimized out>, argv=<optimized out>) |
at ../../../test/csuite/incr_backup/main.c:857 |
This isn't too hard to repeat. I ran test_incr_backup 50 times and hit three failures. The other problem seeds were:
./test_incr_backup -S 5169869354081646233 |
./test_incr_backup -S 5675251266792091845 |
This seems to have been introduced in the backports of WT-6136 and WT-6137. At the previous commit (the backport of WT-5242 and friends) I don't see the problem.
I also don't see this problem (in 200+ runs) in develop or the 4.4 branch, so I expect that something went wrong with the backport to 4.2.
Attachments
Issue Links
- duplicates
-
WT-6215 Clear backup block information on rename: Reverted
-
- Closed
-