Bulk cursor and drop segmentation fault

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines
    • SE Foundations - Q3+ Backlog
    • 2

      Found from WT-15225, a modification to the python test causes a segmentation fault. I have attached the reproducer. Here is the backtrace:

      #0  0x00007ffff752a00b in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007ffff7509859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00007ffff6ba3da8 in __wt_abort (session=0x8a8be0) at /data/wiredtiger/src/os_common/os_abort.c:31
      #3  0x00007ffff6c168b9 in __rec_init (session=0x8a8be0, ref=0x7c8f90, flags=4336, salvage=0x0, reconcilep=0x8a9010) at /data/wiredtiger/src/reconcile/rec_write.c:641
      #4  0x00007ffff6c151b3 in __reconcile (session=0x8a8be0, ref=0x7c8f90, salvage=0x0, flags=4336, page_lockedp=0x7fffffff954f) at /data/wiredtiger/src/reconcile/rec_write.c:275
      #5  0x00007ffff6c1434b in __wt_reconcile (session=0x8a8be0, ref=0x7c8f90, salvage=0x0, flags=4336) at /data/wiredtiger/src/reconcile/rec_write.c:124
      #6  0x00007ffff6b3428c in __wt_evict_file (session=0x8a8be0, syncop=WT_SYNC_CLOSE) at /data/wiredtiger/src/evict/evict_file.c:89
      #7  0x00007ffff6a4e02c in __checkpoint_tree (session=0x8a8be0, is_checkpoint=false, cfg=0x0) at /data/wiredtiger/src/checkpoint/checkpoint_txn.c:2549
      #8  0x00007ffff6a4f23f in __wt_checkpoint_close (session=0x8a8be0, final=false) at /data/wiredtiger/src/checkpoint/checkpoint_txn.c:2840
      #9  0x00007ffff6a7d44d in __wt_conn_dhandle_close (session=0x8a8be0, final=false, mark_dead=false, check_visibility=true) at /data/wiredtiger/src/conn/conn_dhandle.c:448
      #10 0x00007ffff6a7ee8e in __conn_dhandle_close_one (session=0x8a8be0, uri=0x7d21a0 "file:test_drop04.wt", checkpoint=0x0, removed=true, mark_dead=false, check_visibility=true)
          at /data/wiredtiger/src/conn/conn_dhandle.c:833
      #11 0x00007ffff6a7f15d in __wt_conn_dhandle_close_all (session=0x8a8be0, uri=0x7d21a0 "file:test_drop04.wt", removed=true, mark_dead=false, check_visibility=true) at /data/wiredtiger/src/conn/conn_dhandle.c:873
      #12 0x00007ffff6c34eab in __drop_file (session=0x8a8be0, uri=0x7d21a0 "file:test_drop04.wt", force=false, cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:40
      #13 0x00007ffff6c371dd in __schema_drop (session=0x8a8be0, uri=0x7d21a0 "file:test_drop04.wt", cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:443
      #14 0x00007ffff6c375bc in __wt_schema_drop (session=0x8a8be0, uri=0x7d21a0 "file:test_drop04.wt", cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:494
      #15 0x00007ffff6c35d4c in __drop_table (session=0x8a8be0, uri=0x6af030 "table:test_drop04", force=false, cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:225
      #16 0x00007ffff6c372af in __schema_drop (session=0x8a8be0, uri=0x6af030 "table:test_drop04", cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:449
      #17 0x00007ffff6c375bc in __wt_schema_drop (session=0x8a8be0, uri=0x6af030 "table:test_drop04", cfg=0x7fffffffc070, check_visibility=true) at /data/wiredtiger/src/schema/schema_drop.c:494
      #18 0x00007ffff6c5879a in __session_drop (wt_session=0x8a8be0, uri=0x6af030 "table:test_drop04", config=0x4ed3b0 "force=false") at /data/wiredtiger/src/session/session_api.c:1295
      #19 0x00007ffff6e9c8bc in _wrap_Session_drop (self=0x7ffff6faac50, args=0x7ffff64dc280) at /data/wiredtiger/cmake_build/lang/python/CMakeFiles/wiredtiger_python.dir/wiredtigerPYTHON_wrap.c:6379
      #20 0x00007ffff79e8558 in cfunction_call (func=0x7ffff7046700, args=<optimized out>, kwargs=<optimized out>) at ../src/Python-3.10.4/Objects/methodobject.c:552
      

      I have dug deeper into this problem here are a list of symptoms that I found:

      • Bulk handle sets the DHANDLE_EXCLUSIVE and WT_BTREE_BULK flag.
      • Having normal cursors open, fails the drop() with EBUSY because of lock contention within __wt_session_dhandle_try_writelock
      • The bulk cursor should return EBUSY muchalike normal cursors but continues to drop().

      From the symptoms, I found that the bulk cursor that should return EBUSY instead returns 0 (success) at this point:

          if (dhandle->excl_session == session) {
              if (!LF_ISSET(WT_DHANDLE_LOCK_ONLY) &&
                (!F_ISSET(dhandle, WT_DHANDLE_OPEN) ||
                  (btree != NULL && F_ISSET(btree, WT_BTREE_SPECIAL_FLAGS))))
                  return (__wt_set_return(session, EBUSY));
              ++dhandle->excl_ref;
              return (0);
          }
      

      The code here means if the same session that has the DHANDLE_EXCLUSIVE flag, can just directly use it again. I don't think this correct.

              Assignee:
              [DO NOT USE] Backlog - Storage Engines Team
              Reporter:
              Jie Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: