Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1814

Deadlock caused by sweep server

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.5.2
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      I noticed today that the Jenkins job for medium-lsm-compact was hung. It is a deadlock around the schema and dhandle locks, presumably. The job does not have line numbers. Here are the stacks of threads waiting:

      Thread 10 (Thread 0x7f4904bfe700 (LWP 17853)):
      #0  0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
      WT-1  0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
      WT-2  0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      WT-3  0x0000000000481ee4 in __wt_conn_dhandle_discard_single ()
      WT-4  0x00000000004117cc in __sweep_server ()
      WT-5  0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
      WT-6  0x00007f4905e93b2d in clone () from /lib64/libc.so.6
      
      Thread 7 (Thread 0x7f49033fb700 (LWP 17856)):
      #0  0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
      WT-1  0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
      WT-2  0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      WT-3  0x00000000004211af in __lsm_worker_manager ()
      WT-4  0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
      WT-5  0x00007f4905e93b2d in clone () from /lib64/libc.so.6
      
      Thread 6 (Thread 0x7f4902bfa700 (LWP 17857)):
      #0  0x00007f4905e7cb97 in sched_yield () from /lib64/libc.so.6
      WT-1  0x0000000000480bd7 in __conn_dhandle_open_lock ()
      WT-2  0x00000000004815b7 in __wt_conn_btree_get ()
      WT-3  0x000000000044b492 in __wt_session_get_btree ()
      WT-4  0x0000000000481d8f in __wt_conn_dhandle_close_all ()
      WT-5  0x00000000004419bf in __wt_schema_drop ()
      WT-6  0x000000000049fa1f in __lsm_drop_file ()
      WT-7  0x00000000004a056d in __wt_lsm_free_chunks ()
      WT-8  0x00000000004255b9 in __lsm_worker ()
      WT-9  0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
      WT-10 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
      
      Thread 5 (Thread 0x7f4901bff700 (LWP 17858)):
      #0  0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
      WT-1  0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
      WT-2  0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      WT-3  0x000000000044b3a3 in __wt_session_get_btree ()
      WT-4  0x000000000044b6d2 in __wt_session_get_btree_ckpt ()
      WT-5  0x000000000048b13f in __wt_curfile_open ()
      WT-6  0x00000000004498d0 in __wt_open_cursor ()
      WT-7  0x0000000000449b25 in __session_open_cursor ()
      WT-8  0x00000000004b04e4 in __wt_bloom_finalize ()
      WT-9  0x000000000049ffbf in __wt_lsm_work_bloom ()
      WT-10 0x00000000004255f5 in __lsm_worker ()
      WT-11 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
      WT-12 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
      
      Thread 3 (Thread 0x7f4900bfd700 (LWP 17860)):
      #0  0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0
      WT-1  0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0
      WT-2  0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      WT-3  0x0000000000448706 in __session_create ()
      WT-4  0x00000000004b04bb in __wt_bloom_finalize ()
      WT-5  0x000000000049e5d2 in __wt_lsm_merge ()
      WT-6  0x00000000004254d7 in __lsm_worker ()
      WT-7  0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0
      WT-8  0x00007f4905e93b2d in clone () from /lib64/libc.so.6
      

      I will try to repro on the AWS HDD machine.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: