Description
I noticed today that the Jenkins job for medium-lsm-compact was hung. It is a deadlock around the schema and dhandle locks, presumably. The job does not have line numbers. Here are the stacks of threads waiting:
Thread 10 (Thread 0x7f4904bfe700 (LWP 17853)):
|
#0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
|
WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
|
WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
|
WT-3 0x0000000000481ee4 in __wt_conn_dhandle_discard_single ()
|
WT-4 0x00000000004117cc in __sweep_server ()
|
WT-5 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
|
WT-6 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
|
|
Thread 7 (Thread 0x7f49033fb700 (LWP 17856)):
|
#0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
|
WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
|
WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
|
WT-3 0x00000000004211af in __lsm_worker_manager ()
|
WT-4 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
|
WT-5 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
|
|
Thread 6 (Thread 0x7f4902bfa700 (LWP 17857)):
|
#0 0x00007f4905e7cb97 in sched_yield () from /lib64/libc.so.6
|
WT-1 0x0000000000480bd7 in __conn_dhandle_open_lock ()
|
WT-2 0x00000000004815b7 in __wt_conn_btree_get ()
|
WT-3 0x000000000044b492 in __wt_session_get_btree ()
|
WT-4 0x0000000000481d8f in __wt_conn_dhandle_close_all ()
|
WT-5 0x00000000004419bf in __wt_schema_drop ()
|
WT-6 0x000000000049fa1f in __lsm_drop_file ()
|
WT-7 0x00000000004a056d in __wt_lsm_free_chunks ()
|
WT-8 0x00000000004255b9 in __lsm_worker ()
|
WT-9 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
|
WT-10 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
|
|
Thread 5 (Thread 0x7f4901bff700 (LWP 17858)):
|
#0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
|
WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
|
WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
|
WT-3 0x000000000044b3a3 in __wt_session_get_btree ()
|
WT-4 0x000000000044b6d2 in __wt_session_get_btree_ckpt ()
|
WT-5 0x000000000048b13f in __wt_curfile_open ()
|
WT-6 0x00000000004498d0 in __wt_open_cursor ()
|
WT-7 0x0000000000449b25 in __session_open_cursor ()
|
WT-8 0x00000000004b04e4 in __wt_bloom_finalize ()
|
WT-9 0x000000000049ffbf in __wt_lsm_work_bloom ()
|
WT-10 0x00000000004255f5 in __lsm_worker ()
|
WT-11 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
|
WT-12 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
|
|
Thread 3 (Thread 0x7f4900bfd700 (LWP 17860)):
|
#0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
|
WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
|
WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
|
WT-3 0x0000000000448706 in __session_create ()
|
WT-4 0x00000000004b04bb in __wt_bloom_finalize ()
|
WT-5 0x000000000049e5d2 in __wt_lsm_merge ()
|
WT-6 0x00000000004254d7 in __lsm_worker ()
|
WT-7 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
|
WT-8 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
|
I will try to repro on the AWS HDD machine.
Attachments
Issue Links
- is related to
-
WT-1819 Split sweep into two passes
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-1811 Change sweep to not wait on the dhandle list lock
- Closed