Hopefully all of these are caused by a few underlying bugs that are just manifesting in different ways. Jenkins wiredtiger-test-format-stress took a segfault that is different than WT-896 and WT-898. Here's the CONFIG:
############################################
|
# RUN PARAMETERS
|
############################################
|
auto_throttle=1
|
firstfit=0
|
# bitcnt not applicable to this run
|
bloom=1
|
bloom_bit_count=55
|
bloom_hash_count=5
|
bloom_oldest=1
|
cache=95
|
checksum=uncompressed
|
chunk_size=1
|
compaction=0
|
compression=none
|
data_extend=0
|
data_source=table
|
delete_pct=2
|
dictionary=0
|
file_type=row-store
|
hot_backups=0
|
huffman_key=0
|
huffman_value=0
|
insert_pct=23
|
internal_key_truncation=1
|
internal_page_max=15
|
key_gap=0
|
key_max=127
|
key_min=12
|
leaf_page_max=9
|
merge_max=5
|
merge_threads=2
|
mmap=1
|
ops=100000
|
prefix_compression=1
|
prefix_compression_min=4
|
repeat_data_pct=66
|
reverse=0
|
rows=100000
|
runs=100
|
split_pct=59
|
statistics=0
|
threads=27
|
value_max=1461
|
value_min=16
|
# wiredtiger_config not applicable to this run
|
write_pct=55
|
############################################
|
Here's the stack:
(gdb) bt
|
#0 0x00000000004704c3 in __rec_row_int (session=0xcf10c0, r=0x7f627001c5e0,
|
page=0x7f6264016210) at ../src/btree/rec_write.c:3141
|
WT-1 0x000000000046afd9 in __wt_rec_write (session=0xcf10c0,
|
page=0x7f6264016210, salvage=0x0, flags=0) at ../src/btree/rec_write.c:362
|
WT-2 0x000000000044ff0d in __wt_sync_file (session=0xcf10c0, syncop=8)
|
at ../src/btree/bt_evict.c:649
|
WT-3 0x000000000045fb68 in __wt_bt_cache_op (session=0xcf10c0,
|
ckptbase=0x7f6270012a00, op=8) at ../src/btree/bt_sync.c:53
|
WT-4 0x0000000000449ad7 in __checkpoint_worker (session=0xcf10c0,
|
cfg=0x7f62875fdcf0, is_checkpoint=1) at ../src/txn/txn_ckpt.c:750
|
WT-5 0x0000000000449ce1 in __wt_checkpoint (session=0xcf10c0,
|
cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:802
|
WT-6 0x0000000000497c6e in __wt_meta_btree_apply (session=0xcf10c0,
|
func=0x449c79 <__wt_checkpoint>, cfg=0x7f62875fdcf0)
|
at ../src/meta/meta_apply.c:45
|
WT-7 0x0000000000448838 in __checkpoint_apply (session=0xcf10c0,
|
cfg=0x7f62875fdcf0, op=0x449c79 <__wt_checkpoint>, fullp=0x0)
|
at ../src/txn/txn_ckpt.c:129
|
WT-8 0x0000000000448b1e in __wt_txn_checkpoint (session=0xcf10c0,
|
cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:243
|
WT-9 0x000000000043fe2d in __session_checkpoint (wt_session=0xcf10c0,
|
config=0x7f62875fdd80 "name=thread-6") at ../src/session/session_api.c:716
|
WT-10 0x000000000040fb9c in ops (arg=0xd09890) at ../../../test/format/ops.c:272
|
WT-11 0x0000003789207851 in start_thread () from /lib64/libpthread.so.0
|
WT-12 0x0000003788ae767d in clone () from /lib64/libc.so.6
|
The addr is NULL and we're dereferencing it at line 3141:
(gdb) list
|
3136 * cell type has been set in the case of page deletion requiring
|
3137 * a proxy cell, otherwise use the information from the addr or
|
3138 * original cell.
|
3139 */
|
3140 if (__wt_off_page(page, addr)) {
|
3141 p = addr->addr;
|
3142 size = addr->size;
|
3143 if (vtype == 0)
|
3144 vtype = __rec_vtype(addr);
|
3145 } else {
|
(gdb) p *page
|
$1 = {parent = 0x7f6288003610, ref = 0x7f62880037f8, u = {intl = {recno = 0,
|
t = 0x7f6264016268}, row = {d = 0x0, ins = 0x7f6264016268, upd = 0x0},
|
col_fix = {recno = 0, bitf = 0x7f6264016268 " \233\070$b\177"}, col_var = {
|
recno = 0, d = 0x7f6264016268, repeats = 0x0, nrepeats = 0}},
|
dsk = 0x7f6264011600, modify = 0x7f62348d6c60, read_gen = 101,
|
memory_footprint = 32341, entries = 317, type = 6 '\006',
|
flags_atomic = 2 '\002'}
|
(gdb) p *addr
|
Cannot access memory at address 0x0
|
It is interesting to note that addr which is NULL and does not change AFAICT, comes from ref->addr and that is not NULL when we crash. Also rp comes from ref->page contains an address, but ref->page is NULL when we crash. Perhaps we're racing someone else modifying this.
(gdb) p *ref
|
$2 = {page = 0x0, addr = 0x7f620802f920, key = {recno = 38654723707,
|
ikey = 0x90000467b, pkey = 38654723707}, txnid = 0, state = WT_REF_DISK,
|
unused = 0}
|
(gdb) p rp
|
$7 = (WT_PAGE *) 0x7f622801ed20
|
(gdb) p *rp
|
$8 = {parent = 0xabababababababab, ref = 0xabababababababab, u = {intl = {
|
recno = 12370169555311111083, t = 0xabababababababab}, row = {
|
d = 0xabababababababab, ins = 0xabababababababab,
|
upd = 0xabababababababab}, col_fix = {recno = 12370169555311111083,
|
bitf = 0xabababababababab <Address 0xabababababababab out of bounds>},
|
col_var = {recno = 12370169555311111083, d = 0xabababababababab,
|
repeats = 0xabababababababab, nrepeats = 2880154539}},
|
dsk = 0xabababababababab, modify = 0xabababababababab,
|
read_gen = 12370169555311111083, memory_footprint = 12370169555311111083,
|
entries = 2880154539, type = 171 '\253', flags_atomic = 171 '\253'}
|
Most application threads are waiting on an application lock g.backup_lock.
The eviction server has an interesting stack. Perhaps we raced eviction:
Thread 12 (Thread 0x7f628f7c1700 (LWP 32662)):
|
#0 __wt_page_hazard_check (session=0xcefa40, page=0x7f625c1a8c10)
|
at ../src/include/btree.i:732
|
WT-1 0x00000000004666e7 in __hazard_exclusive (session=0xcefa40,
|
ref=0x7f6264017ac8, top=1) at ../src/btree/rec_evict.c:562
|
WT-2 0x00000000004660c1 in __rec_review (session=0xcefa40, ref=0x7f6264017ac8,
|
page=0x7f625c1a8c10, exclusive=0, merge=0, top=1,
|
inmem_split=0x7f628f7c0c90, istree=0x7f628f7c0c8c)
|
at ../src/btree/rec_evict.c:306
|
WT-3 0x000000000046593b in __wt_rec_evict (session=0xcefa40,
|
page=0x7f625c1a8c10, exclusive=0) at ../src/btree/rec_evict.c:62
|
WT-4 0x000000000044f933 in __wt_evict_page (session=0xcefa40,
|
page=0x7f625c1a8c10) at ../src/btree/bt_evict.c:403
|
WT-5 0x0000000000450f89 in __wt_evict_lru_page (session=0xcefa40, is_app=0)
|
at ../src/btree/bt_evict.c:1242
|
WT-6 0x00000000004502d9 in __evict_lru (session=0xcefa40, flags=2)
|
at ../src/btree/bt_evict.c:762
|
WT-7 0x000000000044f595 in __evict_worker (session=0xcefa40)
|
at ../src/btree/bt_evict.c:272
|
WT-8 0x000000000044f104 in __wt_cache_evict_server (arg=0xcefa40)
|
at ../src/btree/bt_evict.c:164
|
WT-9 0x0000003789207851 in start_thread () from /lib64/libpthread.so.0
|
WT-10 0x0000003788ae767d in clone () from /lib64/libc.so.6
|
- is related to
-
WT-896 test/format segfault
- Closed
-
WT-898 test/format WT_CHILD_MODIFIED failure
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-900 Use hazard references to prevent child pages being evicted during reconciliation of their parent
- Closed