-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
There was a Jenkins test failure running the medium-multi-lsm wtperf workload:
http://build.wiredtiger.com:8080/job/wiredtiger-perf-med-multi-lsm/941/console
The failure doesn't reproduce immediately. The failure is:
../../../bench/wtperf/runners/wtperf_run.sh: line 147: 16623 Segmentation fault (core dumped) LD_PRELOAD=/usr/lib64/libjemalloc.so.1 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib ./wtperf -O $wttest
The stack trace is:
#0 0x0000000000419084 in __wt_cursor_init () #1 0x000000000048e9ce in __wt_curfile_create () #2 0x000000000048ebc9 in __wt_curfile_open () #3 0x000000000044c568 in __wt_open_cursor () #4 0x000000000049c04e in __clsm_open_cursors () #5 0x00000000004a1b9f in __wt_clsm_init_merge () #6 0x00000000004a2633 in __wt_lsm_merge () #7 0x0000000000427877 in __lsm_worker () #8 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f821232eb2d in clone () from /lib64/libc.so.6
There are a number of other threads active concurrently:
Thread 14 (Thread 0x7f820bffb700 (LWP 16634)): #0 0x00007f82123292c7 in ftruncate64 () from /lib64/libc.so.6 #1 0x000000000042b4be in __wt_ftruncate () #2 0x000000000045e831 in __wt_block_truncate () #3 0x00000000004ccdc8 in __wt_block_checkpoint_unload () #4 0x00000000004b6403 in __bm_checkpoint_unload () #5 0x0000000000461972 in __wt_btree_close () at ../src/btree/bt_handle.c:147 #6 0x00000000004848d0 in __wt_conn_btree_sync_and_close () #7 0x000000000044de0c in __wt_session_release_btree () #8 0x000000000048ce28 in __curfile_close () #9 0x00000000004a2ac5 in __wt_lsm_merge () #10 0x0000000000427877 in __lsm_worker () #11 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f821232eb2d in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f820effa700 (LWP 16629)): #0 0x00007f8212322fe7 in unlink () from /lib64/libc.so.6 #1 0x00007f82122b1259 in remove () from /lib64/libc.so.6 #2 0x000000000042c7f6 in __wt_remove () #3 0x00000000004a3fa8 in __lsm_drop_file () #4 0x00000000004a4b1d in __wt_lsm_free_chunks () #5 0x0000000000427959 in __lsm_worker () #6 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f821232eb2d in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f820dfff700 (LWP 16630)): #0 0x00007f82125ff265 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f82125fadc1 in _L_lock_816 () from /lib64/libpthread.so.0 #2 0x00007f82125facc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000044e5b7 in __wt_session_get_btree () #4 0x000000000044e732 in __wt_session_get_btree_ckpt () #5 0x000000000048ecbd in __wt_curfile_open () #6 0x000000000044c568 in __wt_open_cursor () #7 0x00000000004b8f5b in __wt_bloom_hash_get () #8 0x00000000004b8fc8 in __wt_bloom_get () #9 0x00000000004a2bad in __wt_lsm_merge () #10 0x0000000000427877 in __lsm_worker () #11 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f821232eb2d in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f820d7fe700 (LWP 16631)): #0 0x00007f82125ff265 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f82125fadc1 in _L_lock_816 () from /lib64/libpthread.so.0 #2 0x00007f82125facc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000044b04f in __session_create () #4 0x00000000004b8cdb in __wt_bloom_finalize () #5 0x00000000004a456f in __wt_lsm_work_bloom () #6 0x0000000000427995 in __lsm_worker () #7 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f821232eb2d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f820c7fc700 (LWP 16633)): #0 0x00007f8212f248df in bitmap_sfu (arena=0x7f8211c51ac0, tbin=0x7f8202806088, binind=3, prof_accumbytes=<value optimized out>) at include/jemalloc/internal/bitmap.h:137 #1 arena_run_reg_alloc (arena=0x7f8211c51ac0, tbin=0x7f8202806088, binind=3, prof_accumbytes=<value optimized out>) at src/arena.c:325 #2 arena_tcache_fill_small (arena=0x7f8211c51ac0, tbin=0x7f8202806088, binind=3, prof_accumbytes=<value optimized out>) at src/arena.c:1348 #3 0x00007f8212f3d6ff in tcache_alloc_small_hard (tcache=<value optimized out>, tbin=0x7f8202806088, binind=<value optimized out>) at src/tcache.c:72 #4 0x00007f8212f1d85a in tcache_alloc_small (num=<value optimized out>, size=<value optimized out>) at include/jemalloc/internal/tcache.h:302 #5 arena_malloc (num=<value optimized out>, size=<value optimized out>) at include/jemalloc/internal/arena.h:916 #6 icallocx (num=<value optimized out>, size=<value optimized out>) at include/jemalloc/internal/jemalloc_internal.h:800 #7 icalloc (num=<value optimized out>, size=<value optimized out>) at include/jemalloc/internal/jemalloc_internal.h:809 #8 calloc (num=<value optimized out>, size=<value optimized out>) at src/jemalloc.c:1079 #9 0x0000000000429d60 in __wt_calloc () #10 0x0000000000465eb3 in __wt_page_alloc () #11 0x0000000000465fec in __wt_page_inmem () #12 0x0000000000468568 in __wt_cache_read () #13 0x0000000000465933 in __wt_page_in_func () #14 0x000000000047866d in __wt_tree_walk () #15 0x00000000004b9a86 in __wt_btcur_next () #16 0x000000000048c9fa in __curfile_next () #17 0x000000000049db36 in __clsm_next () #18 0x00000000004a4542 in __wt_lsm_work_bloom () #19 0x0000000000427995 in __lsm_worker () #20 0x00007f82125f8f18 in start_thread () from /lib64/libpthread.so.0 #21 0x00007f821232eb2d in clone () from /lib64/libc.so.6
The failure happened towards the end of a load phase. The final three lines of WT_TEST/test.stat:
579597 populate inserts (44372348 of 50000000) in 5 secs (180 total secs) 379201 populate inserts (44751549 of 50000000) in 5 secs (185 total secs) 984072 populate inserts (45735621 of 50000000) in 5 secs (190 total secs)