Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4653

test/format failing with LSM deadlock

      There have been several Jenkins test/format jobs lately that have been failing with "format run more than 15 minutes past the maximum time" but the cache is not full. Finally there was one that was not on a sanitizer run (that does not produce core files). There was another recent LSM deadlock in WT-4577 that was fixed, but this one doesn't look related on first glance.

      The changeset of the failure is 07a346d2d683cc8.

      The job that failed with a core is:
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-ppc/20763/
      There is a tarball on the PPC machine in the Jenkins directory.

      Other failures that are likely the same thing are (criteria are: not full cache and using LSM):
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-sanitizer-ppc/7241/
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-sanitizer-ppc/7091/

      The stacks of interesting threads are:

      Thread 11 (Thread 0x3fff9966f1b0 (LWP 40010)):
      #0  0x00003fff9f631938 in __lll_lock_wait () from /lib64/power8/libpthread.so.0
      #1  0x00003fff9f62af8c in pthread_mutex_lock ()
         from /lib64/power8/libpthread.so.0
      #2  0x00000000100e99c4 in __wt_spin_lock (session=0x3fff9eeaa308, 
          t=0x100224fc968) at ../src/include/mutex.i:173
      #3  0x00000000100e9d50 in __wt_spin_lock_track (session=0x3fff9eeaa308, 
          t=0x100224fc968) at ../src/include/mutex.i:323
      #4  0x00000000100ee108 in __wt_txn_checkpoint (session=0x3fff9eeaa308, 
          cfg=0x3fff9966e688, waiting=true) at ../src/txn/txn_ckpt.c:1171
      #5  0x00000000100c571c in __session_checkpoint (wt_session=0x3fff9eeaa308, 
          config=0x0) at ../src/session/session_api.c:1984
      #6  0x00000000100169a4 in checkpoint (arg=0x0)
          at ../../../test/format/util.c:571
      #7  0x00003fff9f628944 in start_thread () from /lib64/power8/libpthread.so.0
      #8  0x00003fff9f4a7640 in clone () from /lib64/power8/libc.so.6
      
      Thread 10 (Thread 0x3fff99e6f1b0 (LWP 40009)):
      #0  0x00003fff9f6318e8 in __lll_lock_wait () from /lib64/power8/libpthread.so.0
      #1  0x00003fff9f62af8c in pthread_mutex_lock ()
         from /lib64/power8/libpthread.so.0
      #2  0x0000000010061078 in __wt_spin_lock (session=0x3fff9eea4140, 
          t=0x100224fca68) at ../src/include/mutex.i:173
      #3  0x000000001006222c in __lsm_tree_close (session=0x3fff9eea4140, 
          lsm_tree=0x3fff600520c0, final=false) at ../src/lsm/lsm_tree.c:135
      #4  0x0000000010063580 in __lsm_tree_find (session=0x3fff9eea4140, 
          uri=0x3fff58001600 "lsm:wt", exclusive=true, treep=0x3fff99e6e208)
          at ../src/lsm/lsm_tree.c:432
      #5  0x0000000010063d78 in __wt_lsm_tree_get (session=0x3fff9eea4140, 
          uri=0x3fff58001600 "lsm:wt", exclusive=true, treep=0x3fff99e6e208)
          at ../src/lsm/lsm_tree.c:580
      #6  0x000000001006692c in __wt_lsm_tree_worker (session=0x3fff9eea4140, 
          uri=0x3fff58001600 "lsm:wt", file_func=0x1020d090 <__alter_file>, 
          name_func=0x0, cfg=0x3fff99e6e6c0, open_flags=336)
          at ../src/lsm/lsm_tree.c:1388
      #7  0x000000001020db38 in __schema_alter (session=0x3fff9eea4140, 
          uri=0x3fff58001600 "lsm:wt", newcfg=0x3fff99e6e6c0)
          at ../src/schema/schema_alter.c:235
      #8  0x000000001020d3c0 in __alter_tree (session=0x3fff9eea4140, 
          name=0x10022596df0 "colgroup:wt", newcfg=0x3fff99e6e6c0)
          at ../src/schema/schema_alter.c:116
      #9  0x000000001020d6e8 in __alter_table (session=0x3fff9eea4140, 
          uri=0x100224684f0 "table:wt", newcfg=0x3fff99e6e6c0, 
          exclusive_refreshed=true) at ../src/schema/schema_alter.c:175
      #10 0x000000001020db9c in __schema_alter (session=0x3fff9eea4140, 
          uri=0x100224684f0 "table:wt", newcfg=0x3fff99e6e6c0)
          at ../src/schema/schema_alter.c:238
      #11 0x000000001020dc7c in __wt_schema_alter (session=0x3fff9eea4140, 
          uri=0x100224684f0 "table:wt", newcfg=0x3fff99e6e6c0)
          at ../src/schema/schema_alter.c:257
      #12 0x00000000100b4788 in __session_alter (wt_session=0x3fff9eea4140, 
          uri=0x100224684f0 "table:wt", 
          config=0x3fff99e6e7a0 "access_pattern_hint=random")
          at ../src/session/session_api.c:670
      #13 0x000000001001707c in alter (arg=0x0) at ../../../test/format/util.c:674
      #14 0x00003fff9f628944 in start_thread () from /lib64/power8/libpthread.so.0
      #15 0x00003fff9f4a7640 in clone () from /lib64/power8/libc.so.6
      
      Thread 9 (Thread 0x3fff98e6f1b0 (LWP 40011)):
      #0  0x00003fff9f62dfb4 in pthread_cond_timedwait@@GLIBC_2.17 ()
         from /lib64/power8/libpthread.so.0
      #1  0x000000001007d030 in __wt_cond_wait_signal (session=0x3fff9eeaa7b0, 
          cond=0x100225338b0, usecs=10000, run_func=0x100d4bb0 <__read_blocked>, 
          signalled=0x3fff98e6e1c0) at ../src/os_posix/os_mtx_cond.c:122
      #2  0x00000000100d4740 in __wt_cond_wait (session=0x3fff9eeaa7b0, 
          cond=0x100225338b0, usecs=10000, run_func=0x100d4bb0 <__read_blocked>)
          at ../src/include/misc.i:19
      #3  0x00000000100d4f48 in __wt_readlock (session=0x3fff9eeaa7b0, 
          l=0x10022533680) at ../src/support/mtx_rw.c:259
      #4  0x00000000100ca0dc in __wt_session_lock_dhandle (session=0x3fff9eeaa7b0, 
          flags=0, is_deadp=0x3fff98e6e438) at ../src/session/session_dhandle.c:183
      #5  0x00000000100cb37c in __wt_session_get_dhandle (session=0x3fff9eeaa7b0, 
          uri=0x100224684f0 "table:wt", checkpoint=0x0, cfg=0x0, flags=0)
          at ../src/session/session_dhandle.c:509
      #6  0x00000000100a5760 in __wt_schema_get_table_uri (session=0x3fff9eeaa7b0, 
          uri=0x100224684f0 "table:wt", ok_incomplete=false, flags=0, 
          tablep=0x3fff98e6e580) at ../src/schema/schema_list.c:27
      #7  0x00000000100af084 in __wt_schema_worker (session=0x3fff9eeaa7b0, 
          uri=0x100224684f0 "table:wt", 
          file_func=0x100c77e4 <__compact_handle_append>, 
          name_func=0x100c76d4 <__compact_uri_analyze>, cfg=0x3fff98e6e6e8, 
          open_flags=0) at ../src/schema/schema_worker.c:97
      #8  0x00000000100c8918 in __wt_session_compact (wt_session=0x3fff9eeaa7b0, 
          uri=0x100224684f0 "table:wt", config=0x0)
          at ../src/session/session_compact.c:413
      #9  0x0000000010005b04 in compact (arg=0x0)
          at ../../../test/format/compact.c:74
      #10 0x00003fff9f628944 in start_thread () from /lib64/power8/libpthread.so.0
      #11 0x00003fff9f4a7640 in clone () from /lib64/power8/libc.so.6
      Thread 6 (Thread 0x3fff9ae6f1b0 (LWP 39965)):
      #0  0x00003fff9f6318e8 in __lll_lock_wait () from /lib64/power8/libpthread.so.0
      #1  0x00003fff9f62af8c in pthread_mutex_lock ()
         from /lib64/power8/libpthread.so.0
      #2  0x0000000010202670 in __wt_spin_lock (session=0x3fff9eea4f38, 
          t=0x100224fc968) at ../src/include/mutex.i:173
      #3  0x00000000102029fc in __wt_spin_lock_track (session=0x3fff9eea4f38, 
          t=0x100224fc968) at ../src/include/mutex.i:323
      #4  0x0000000010204770 in __wt_lsm_checkpoint_chunk (session=0x3fff9eea4f38, 
          lsm_tree=0x3fff600520c0, chunk=0x3fff48031a70)
          at ../src/lsm/lsm_work_unit.c:456
      #5  0x0000000010067128 in __lsm_worker_general_op (session=0x3fff9eea4f38, 
          cookie=0x100225033d8, completed=0x3fff9ae6e7b9)
          at ../src/lsm/lsm_worker.c:79
      #6  0x00000000100673b8 in __lsm_worker (arg=0x100225033d8)
          at ../src/lsm/lsm_worker.c:135
      #7  0x00003fff9f628944 in start_thread () from /lib64/power8/libpthread.so.0
      #8  0x00003fff9f4a7640 in clone () from /lib64/power8/libc.so.6
      
      Thread 3 (Thread 0x3fff9a66f1b0 (LWP 40008)):
      #0  0x00003fff9f62dfb4 in pthread_cond_timedwait@@GLIBC_2.17 ()
         from /lib64/power8/libpthread.so.0
      #1  0x000000001007d030 in __wt_cond_wait_signal (session=0x3fff9eeaac58, 
          cond=0x100225338b0, usecs=10000, run_func=0x100d4bb0 <__read_blocked>, 
          signalled=0x3fff9a66bc80) at ../src/os_posix/os_mtx_cond.c:122
      #2  0x00000000100d4740 in __wt_cond_wait (session=0x3fff9eeaac58, 
          cond=0x100225338b0, usecs=10000, run_func=0x100d4bb0 <__read_blocked>)
          at ../src/include/misc.i:19
      #3  0x00000000100d4f48 in __wt_readlock (session=0x3fff9eeaac58, 
          l=0x10022533680) at ../src/support/mtx_rw.c:259
      #4  0x00000000100ca0dc in __wt_session_lock_dhandle (session=0x3fff9eeaac58, 
          flags=0, is_deadp=0x3fff9a66bef8) at ../src/session/session_dhandle.c:183
      #5  0x00000000100cb37c in __wt_session_get_dhandle (session=0x3fff9eeaac58, 
          uri=0x100224684f0 "table:wt", checkpoint=0x0, cfg=0x0, flags=0)
          at ../src/session/session_dhandle.c:509
      #6  0x00000000100a5760 in __wt_schema_get_table_uri (session=0x3fff9eeaac58, 
          uri=0x100224684f0 "table:wt", ok_incomplete=false, flags=0, 
          tablep=0x3fff9a66c0d8) at ../src/schema/schema_list.c:27
      #7  0x00000000101d7854 in __wt_curtable_open (session=0x3fff9eeaac58, 
          uri=0x100224684f0 "table:wt", owner=0x0, cfg=0x3fff9a66c270, 
          cursorp=0x3fff9a66c268) at ../src/cursor/cur_table.c:987
      #8  0x00000000100b2efc in __session_open_cursor_int (session=0x3fff9eeaac58, 
          uri=0x100224684f0 "table:wt", owner=0x0, other=0x0, cfg=0x3fff9a66c270, 
          cursorp=0x3fff9a66c268) at ../src/session/session_api.c:445
      #9  0x00000000100b3ea0 in __session_open_cursor (wt_session=0x3fff9eeaac58, 
          uri=0x100224684f0 "table:wt", to_dup=0x0, config=0x10448860 "append", 
          cursorp=0x3fff9a66e7a0) at ../src/session/session_api.c:616
      #10 0x000000001000cf40 in ops (arg=0x100225158b0)
          at ../../../test/format/ops.c:827
      #11 0x00003fff9f628944 in start_thread () from /lib64/power8/libpthread.so.0
      #12 0x00003fff9f4a7640 in clone () from /lib64/power8/libc.so.6
      

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: