Hopefully all of these are caused by a few underlying bugs that are just manifesting in different ways. Jenkins wiredtiger-test-format-stress took a segfault that is different than WT-896 and WT-898. Here's the CONFIG:
############################################ # RUN PARAMETERS ############################################ auto_throttle=1 firstfit=0 # bitcnt not applicable to this run bloom=1 bloom_bit_count=55 bloom_hash_count=5 bloom_oldest=1 cache=95 checksum=uncompressed chunk_size=1 compaction=0 compression=none data_extend=0 data_source=table delete_pct=2 dictionary=0 file_type=row-store hot_backups=0 huffman_key=0 huffman_value=0 insert_pct=23 internal_key_truncation=1 internal_page_max=15 key_gap=0 key_max=127 key_min=12 leaf_page_max=9 merge_max=5 merge_threads=2 mmap=1 ops=100000 prefix_compression=1 prefix_compression_min=4 repeat_data_pct=66 reverse=0 rows=100000 runs=100 split_pct=59 statistics=0 threads=27 value_max=1461 value_min=16 # wiredtiger_config not applicable to this run write_pct=55 ############################################
Here's the stack:
(gdb) bt
#0 0x00000000004704c3 in __rec_row_int (session=0xcf10c0, r=0x7f627001c5e0,
page=0x7f6264016210) at ../src/btree/rec_write.c:3141
WT-1 0x000000000046afd9 in __wt_rec_write (session=0xcf10c0,
page=0x7f6264016210, salvage=0x0, flags=0) at ../src/btree/rec_write.c:362
WT-2 0x000000000044ff0d in __wt_sync_file (session=0xcf10c0, syncop=8)
at ../src/btree/bt_evict.c:649
WT-3 0x000000000045fb68 in __wt_bt_cache_op (session=0xcf10c0,
ckptbase=0x7f6270012a00, op=8) at ../src/btree/bt_sync.c:53
WT-4 0x0000000000449ad7 in __checkpoint_worker (session=0xcf10c0,
cfg=0x7f62875fdcf0, is_checkpoint=1) at ../src/txn/txn_ckpt.c:750
WT-5 0x0000000000449ce1 in __wt_checkpoint (session=0xcf10c0,
cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:802
WT-6 0x0000000000497c6e in __wt_meta_btree_apply (session=0xcf10c0,
func=0x449c79 <__wt_checkpoint>, cfg=0x7f62875fdcf0)
at ../src/meta/meta_apply.c:45
WT-7 0x0000000000448838 in __checkpoint_apply (session=0xcf10c0,
cfg=0x7f62875fdcf0, op=0x449c79 <__wt_checkpoint>, fullp=0x0)
at ../src/txn/txn_ckpt.c:129
WT-8 0x0000000000448b1e in __wt_txn_checkpoint (session=0xcf10c0,
cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:243
WT-9 0x000000000043fe2d in __session_checkpoint (wt_session=0xcf10c0,
config=0x7f62875fdd80 "name=thread-6") at ../src/session/session_api.c:716
WT-10 0x000000000040fb9c in ops (arg=0xd09890) at ../../../test/format/ops.c:272
WT-11 0x0000003789207851 in start_thread () from /lib64/libpthread.so.0
WT-12 0x0000003788ae767d in clone () from /lib64/libc.so.6
The addr is NULL and we're dereferencing it at line 3141:
(gdb) list 3136 * cell type has been set in the case of page deletion requiring 3137 * a proxy cell, otherwise use the information from the addr or 3138 * original cell. 3139 */ 3140 if (__wt_off_page(page, addr)) { 3141 p = addr->addr; 3142 size = addr->size; 3143 if (vtype == 0) 3144 vtype = __rec_vtype(addr); 3145 } else { (gdb) p *page $1 = {parent = 0x7f6288003610, ref = 0x7f62880037f8, u = {intl = {recno = 0, t = 0x7f6264016268}, row = {d = 0x0, ins = 0x7f6264016268, upd = 0x0}, col_fix = {recno = 0, bitf = 0x7f6264016268 " \233\070$b\177"}, col_var = { recno = 0, d = 0x7f6264016268, repeats = 0x0, nrepeats = 0}}, dsk = 0x7f6264011600, modify = 0x7f62348d6c60, read_gen = 101, memory_footprint = 32341, entries = 317, type = 6 '\006', flags_atomic = 2 '\002'} (gdb) p *addr Cannot access memory at address 0x0
It is interesting to note that addr which is NULL and does not change AFAICT, comes from ref->addr and that is not NULL when we crash. Also rp comes from ref->page contains an address, but ref->page is NULL when we crash. Perhaps we're racing someone else modifying this.
(gdb) p *ref $2 = {page = 0x0, addr = 0x7f620802f920, key = {recno = 38654723707, ikey = 0x90000467b, pkey = 38654723707}, txnid = 0, state = WT_REF_DISK, unused = 0} (gdb) p rp $7 = (WT_PAGE *) 0x7f622801ed20 (gdb) p *rp $8 = {parent = 0xabababababababab, ref = 0xabababababababab, u = {intl = { recno = 12370169555311111083, t = 0xabababababababab}, row = { d = 0xabababababababab, ins = 0xabababababababab, upd = 0xabababababababab}, col_fix = {recno = 12370169555311111083, bitf = 0xabababababababab <Address 0xabababababababab out of bounds>}, col_var = {recno = 12370169555311111083, d = 0xabababababababab, repeats = 0xabababababababab, nrepeats = 2880154539}}, dsk = 0xabababababababab, modify = 0xabababababababab, read_gen = 12370169555311111083, memory_footprint = 12370169555311111083, entries = 2880154539, type = 171 '\253', flags_atomic = 171 '\253'}
Most application threads are waiting on an application lock g.backup_lock.
The eviction server has an interesting stack. Perhaps we raced eviction:
Thread 12 (Thread 0x7f628f7c1700 (LWP 32662)): #0 __wt_page_hazard_check (session=0xcefa40, page=0x7f625c1a8c10) at ../src/include/btree.i:732 WT-1 0x00000000004666e7 in __hazard_exclusive (session=0xcefa40, ref=0x7f6264017ac8, top=1) at ../src/btree/rec_evict.c:562 WT-2 0x00000000004660c1 in __rec_review (session=0xcefa40, ref=0x7f6264017ac8, page=0x7f625c1a8c10, exclusive=0, merge=0, top=1, inmem_split=0x7f628f7c0c90, istree=0x7f628f7c0c8c) at ../src/btree/rec_evict.c:306 WT-3 0x000000000046593b in __wt_rec_evict (session=0xcefa40, page=0x7f625c1a8c10, exclusive=0) at ../src/btree/rec_evict.c:62 WT-4 0x000000000044f933 in __wt_evict_page (session=0xcefa40, page=0x7f625c1a8c10) at ../src/btree/bt_evict.c:403 WT-5 0x0000000000450f89 in __wt_evict_lru_page (session=0xcefa40, is_app=0) at ../src/btree/bt_evict.c:1242 WT-6 0x00000000004502d9 in __evict_lru (session=0xcefa40, flags=2) at ../src/btree/bt_evict.c:762 WT-7 0x000000000044f595 in __evict_worker (session=0xcefa40) at ../src/btree/bt_evict.c:272 WT-8 0x000000000044f104 in __wt_cache_evict_server (arg=0xcefa40) at ../src/btree/bt_evict.c:164 WT-9 0x0000003789207851 in start_thread () from /lib64/libpthread.so.0 WT-10 0x0000003788ae767d in clone () from /lib64/libc.so.6
- is related to
-
WT-896 test/format segfault
- Closed
-
WT-898 test/format WT_CHILD_MODIFIED failure
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-900 Use hazard references to prevent child pages being evicted during reconciliation of their parent
- Closed