[SERVER-41667] Disable 'wt_repair_corrupt_metadata.js' on debug builds Created: 12/Jun/19  Updated: 29/Oct/23  Resolved: 13/Jun/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.2.0-rc2, 4.3.1

Type: Bug Priority: Minor - P4
Reporter: Keith Bostic (Inactive) Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to WT-4317 Read checksum error in test_wt4156_me... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Steps To Reproduce:

The related BF-13542 test failure is currently reliable and I would expect it to continue failing.

Sprint: Execution Team 2019-06-17
Participants:
Linked BF Score: 50

 Description   

Reviewing the wt_repair_corrupt_metadata.js test, it's corrupting the database (by copying in a turtle that doesn't match the other files), and then asserting repair succeeds. When the turtle file doesn't match the other files, there's no reason to believe it will be possible to read the blocks in the WiredTiger.wt file referenced by the turtle file's checkpoints, and not being able to read a block is normally a fatal error in the WiredTiger library.

Because this is a #HAVE_DIAGNOSTIC WiredTiger build, WiredTiger aborts and drops core.

Here's the core dump stack trace:

gdb) where
#0  0x00007ff9ec0e34f5 in raise () from /lib64/libc.so.6
#1  0x00007ff9ec0e4cd5 in abort () from /lib64/libc.so.6
#2  0x00007ff9ee3b68f7 in __wt_abort (session=session@entry=0x7ff9e583e020)
    at src/third_party/wiredtiger/src/os_common/os_abort.c:32
#3  0x00007ff9ee3be654 in __wt_panic (session=session@entry=0x7ff9e583e020)
    at src/third_party/wiredtiger/src/support/err.c:527
#4  0x00007ff9ee3d638d in __wt_block_read_off (
    session=session@entry=0x7ff9e583e020, block=block@entry=0x7ff9e9de4820, 
    buf=0x7ff9ea48c0a0, offset=24576, size=4096, checksum=429975725)
    at src/third_party/wiredtiger/src/block/block_read.c:312
#5  0x00007ff9ee8e59da in __wt_block_extlist_read (session=0x7ff9e583e020, 
    block=0x7ff9e9de4820, el=0x7ff9e9de49d0, ckpt_size=28672)
    at src/third_party/wiredtiger/src/block/block_ext.c:1186
#6  0x00007ff9ee8e5f24 in __wt_block_extlist_read_avail (
    session=session@entry=0x7ff9e583e020, block=block@entry=0x7ff9e9de4820, 
    el=el@entry=0x7ff9e9de49d0, ckpt_size=28672)
    at src/third_party/wiredtiger/src/block/block_ext.c:1149
#7  0x00007ff9ee8e09f7 in __wt_block_checkpoint_load (
    session=session@entry=0x7ff9e583e020, block=0x7ff9e9de4820, 
    addr=<optimized out>, addr_size=31, 
    root_addr=0x7fff5bf6fad0 "\203\201\344\261h\300-", 
    root_addr_sizep=0x7fff5bf6f980, checkpoint=false)
    at src/third_party/wiredtiger/src/block/block_ckpt.c:116
#8  0x00007ff9ee8c6f67 in __bm_checkpoint_load (bm=0x7ff9e9d489a0, 
    session=0x7ff9e583e020, addr=<optimized out>, addr_size=<optimized out>, 
    root_addr=<optimized out>, root_addr_sizep=<optimized out>, 
    checkpoint=false) at src/third_party/wiredtiger/src/block/block_mgr.c:109
#9  0x00007ff9ee812341 in __wt_btree_open (
    session=session@entry=0x7ff9e583e020, op_cfg=op_cfg@entry=0x7fff5bf6ffb0)
    at src/third_party/wiredtiger/src/btree/bt_handle.c:188
#10 0x00007ff9ee7863e6 in __wt_conn_dhandle_open (
    session=session@entry=0x7ff9e583e020, cfg=cfg@entry=0x7fff5bf6ffb0, 
    flags=flags@entry=0)
    at src/third_party/wiredtiger/src/conn/conn_dhandle.c:472
#11 0x00007ff9ee7dc97d in __wt_session_get_dhandle (session=0x7ff9e583e020, 
    uri=0x7ff9f0396743 "file:WiredTiger.wt", checkpoint=0x0, 
    cfg=0x7fff5bf6ffb0, flags=0)
    at src/third_party/wiredtiger/src/session/session_dhandle.c:545
#12 0x00007ff9ee7dcf93 in __wt_session_get_dhandle (session=0x7ff9e583e020, 
    uri=<optimized out>, checkpoint=0x0, cfg=<optimized out>, flags=0)
    at src/third_party/wiredtiger/src/session/session_dhandle.c:537
#13 0x00007ff9ee7dd174 in __wt_session_get_btree_ckpt (
    session=session@entry=0x7ff9e583e020, 
    uri=uri@entry=0x7ff9f0396743 "file:WiredTiger.wt", 
    cfg=cfg@entry=0x7fff5bf6ffb0, flags=flags@entry=0)
    at src/third_party/wiredtiger/src/session/session_dhandle.c:350
#14 0x00007ff9ee86b051 in __wt_curfile_open (
    session=session@entry=0x7ff9e583e020, uri=<optimized out>, 
    uri@entry=0x7ff9f0396743 "file:WiredTiger.wt", owner=owner@entry=0x0, 
    cfg=cfg@entry=0x7fff5bf6ffb0, cursorp=cursorp@entry=0x7fff5bf70000)
    at src/third_party/wiredtiger/src/cursor/cur_file.c:828
#15 0x00007ff9ee7d826f in __session_open_cursor_int (session=0x7ff9e583e020, 
    uri=<optimized out>, owner=0x0, other=<optimized out>, cfg=0x7fff5bf6ffb0, 
    cursorp=0x7fff5bf70000)
    at src/third_party/wiredtiger/src/session/session_api.c:485
#16 0x00007ff9ee7a684e in __wt_metadata_cursor_open (
    session=session@entry=0x7ff9e583e020, config=config@entry=0x0, 
    cursorp=cursorp@entry=0x7fff5bf70000)
    at src/third_party/wiredtiger/src/meta/meta_table.c:69
#17 0x00007ff9ee7a693b in __wt_metadata_cursor (
    session=session@entry=0x7ff9e583e020, cursorp=cursorp@entry=0x0)
    at src/third_party/wiredtiger/src/meta/meta_table.c:115
#18 0x00007ff9ee784f6b in wiredtiger_open (home=<optimized out>, 
    event_handler=event_handler@entry=0x7ff9e9dff088, config=<optimized out>, 
    connectionp=connectionp@entry=0x7ff9e9dff078)
    at src/third_party/wiredtiger/src/conn/conn_api.c:2839

I believe this is a test issue.

There's no reason to believe a turtle or WiredTiger.wt file that doesn't match the rest of the database files can be used, any error you can imagine is possible and there are no operations that can be safely or reliably attempted.

The storage group recently removed some WiredTiger standalone testing that attempted similar operations as part of WT-4317. Of possible interest is this comment that discusses some of the issues: https://jira.mongodb.org/browse/WT-4317?focusedCommentId=2201301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-220130.



 Comments   
Comment by Githook User [ 13/Jun/19 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-41667 Disable 'wt_repair_corrupt_metadata.js' on debug builds

(cherry picked from commit 2581b5d745009e17ac5a94ea7fe1e9a41b7905ed)
Branch: v4.2
https://github.com/mongodb/mongo/commit/f92115cad9d2a4c2ddcf3c2c65092dda2fd7147a

Comment by Githook User [ 13/Jun/19 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-41667 Disable 'wt_repair_corrupt_metadata.js' on debug builds
Branch: master
https://github.com/mongodb/mongo/commit/2581b5d745009e17ac5a94ea7fe1e9a41b7905ed

Comment by Brian Lane [ 13/Jun/19 ]

Thanks louis.williams for the writeup.

FYI - milkie can we get the suggested changes in ASAP as luke.chen is currently blocked doing a WT drop for master and 4.2.

Cheers!

-Brian

Comment by Louis Williams [ 12/Jun/19 ]

At the moment, this is only failing on debug builds because WT diagnostic mode is enabled. This test also depends on WT behavior that isn't guaranteed. I discussed with Keith and I think we should do the following for now:

  1. Disable the test on debug builds (we already do when --nojournal is enabled)
  2. Leave a conspicuous comment in the test reminding maintainers that the test characterizes the current WT salvage behavior but may be subject to change in the future.

If this test becomes more unreliable due to future changes, because we test something of which WT makes no guarantees, we can reevaluate the necessity of this test again.

Generated at Thu Feb 08 04:58:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.