Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3268

Failure to close cursor can get wiredtiger stuck in a cursor-close loop

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.3, 3.5.9
    • Component/s: None
    • Labels:
      None

      Description

      Fault-injection, WT-32, identified this as a potential bug. Here are the details:

      While closing the cursors (and hence writing a checkpoint), a fault was injected to fail ftruncate, which in the non-debug version caused the application to hang. Following backtraces were obtained at a gap of few seconds each when the application appeared hung:

      1:__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      

      1:__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      

      1:__strcmp_sse2_unaligned,__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      

      Fault was induced by failing ftruncate, when it was called with the following backtrace:

      ftruncate
      __posix_file_truncate
      __wt_ftruncate
      __wt_block_truncate
      __wt_block_extlist_truncate
      __ckpt_process
      __wt_block_checkpoint
      __bm_checkpoint
      __rec_write_wrapup
      __wt_reconcile
      __wt_evict_file
      __wt_cache_op
      __checkpoint_tree
      __wt_checkpoint_close
      __wt_conn_btree_sync_and_close
      __wt_session_release_btree
      __curfile_close
      __session_close
      __conn_close
      start_run
      

      Until the fault injection infrastructure gets available for a reproduction of the bug, with the fault injection library in hand, following command will reproduce this bug:

      FAULTINJECT_LIBRARY_NAME='__wt' LD_LIBRARY_PATH='/path/to/fi-lib/.libs:/path/to/wiredtiger/build_posix/.libs' LD_PRELOAD='/path/to/fi-lib/.libs/libfaultinject.so' FAULTINJECT_FAIL_COUNT=113 ./bench/wtperf/wtperf -O ../bench/wtperf/runners/medium-btree.wtperf -o verbose=2
      

        Attachments

          Activity

            People

            • Assignee:
              sulabh.mahajan Sulabh Mahajan
              Reporter:
              sulabh.mahajan Sulabh Mahajan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: