Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3161

checkpoint hang after write failure injection.

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.2, 3.2.13, 3.4.3, 3.5.4
    • Affects Version/s: WT2.9.1
    • Component/s: None
    • Labels:
      None
    • Storage 2017-02-13

      Related to WT-3157. Using branch wt-3157-fixes, and running
      ./test_wt2909_checkpoint_integrity -v , I eventually see the hang. Or running ./test_wt2909_checkpoint_integrity subtest -v -p -o 143 -n 50000, I see occasional hangs.
      This is only test on OS/X at the moment.
      Stack trace:

      (lldb) bt all
      * thread #1: tid = 0x147e51a, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
        * frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d081a6e0, cond=0x00007fb2d04049b0, usecs=200, run_func=0x0000000000000000, signalled=0x00007fff5c2d6b4f) + 397 at os_mtx_cond.c:87
          frame #3: 0x00000001039f4561 test_wt2909_checkpoint_integrity`__wt_cond_wait(session=0x00007fb2d081a6e0, cond=0x00007fb2d04049b0, usecs=200, run_func=0x0000000000000000) + 49 at misc.i:19
          frame #4: 0x00000001039f73a6 test_wt2909_checkpoint_integrity`__log_wait_for_earlier_slot(session=0x00007fb2d081a6e0, slot=0x00007fb2d0840788) + 214 at log.c:50
          frame #5: 0x00000001039f6ddc test_wt2909_checkpoint_integrity`__wt_log_release(session=0x00007fb2d081a6e0, slot=0x00007fb2d0840788, freep=0x00007fff5c2d6c6f) + 396 at log.c:1465
          frame #6: 0x00000001039f9537 test_wt2909_checkpoint_integrity`__log_write_internal(session=0x00007fb2d081a6e0, record=0x00007fb2d0504060, lsnp=0x0000000000000000, flags=20) + 919 at log.c:2153
          frame #7: 0x00000001039f9171 test_wt2909_checkpoint_integrity`__wt_log_write(session=0x00007fb2d081a6e0, record=0x00007fb2d0504060, lsnp=0x0000000000000000, flags=20) + 1393 at log.c:2057
          frame #8: 0x0000000103a832d6 test_wt2909_checkpoint_integrity`__wt_txn_log_commit(session=0x00007fb2d081a6e0, cfg=0x0000000000000000) + 86 at txn_log.c:222
          frame #9: 0x0000000103a7b5c9 test_wt2909_checkpoint_integrity`__wt_txn_commit(session=0x00007fb2d081a6e0, cfg=0x0000000000000000) + 809 at txn.c:583
          frame #10: 0x00000001039c07f6 test_wt2909_checkpoint_integrity`__curfile_insert(cursor=0x00007fb2d18006d0) + 1830 at cur_file.c:263
          frame #11: 0x0000000103a1924c test_wt2909_checkpoint_integrity`__wt_metadata_update(session=0x00007fb2d081a6e0, key="file:subtest.wt", value="access_pattern_hint=none,allocation_size=4KB,app_metadata=,block_allocation=best,block_compressor=,cache_resident=false,checkpoint=(WiredTigerCheckpoint.1=(addr=\"018281e4d55d549c8381e40c5855ca808080808080e22fc0dfc0\",order=1,time=1486063611,size=12288,write_gen=2)),checkpoint_lsn=(1,7168),checksum=uncompressed,collator=,columns=(id,v0,v1,v2,big),dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=2,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=i,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=0,log=(enabled=true),memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=75,value_format=iiiS,version=(major=1,minor=1)") + 556 at meta_table.c:212
          frame #12: 0x0000000103a1a4ca test_wt2909_checkpoint_integrity`__meta_track_unroll(session=0x00007fb2d081a6e0, trk=0x00007fb2d205d0f8) + 490 at meta_track.c:216
          frame #13: 0x0000000103a19f08 test_wt2909_checkpoint_integrity`__wt_meta_track_off(session=0x00007fb2d081a6e0, need_sync=false, unroll=true) + 216 at meta_track.c:260
          frame #14: 0x0000000103a7f074 test_wt2909_checkpoint_integrity`__txn_checkpoint(session=0x00007fb2d081a6e0, cfg=0x00007fff5c2d7300) + 4004 at txn_ckpt.c:846
          frame #15: 0x0000000103a7d254 test_wt2909_checkpoint_integrity`__txn_checkpoint_wrapper(session=0x00007fb2d081a6e0, cfg=0x00007fff5c2d7300) + 180 at txn_ckpt.c:906
          frame #16: 0x0000000103a7d06f test_wt2909_checkpoint_integrity`__wt_txn_checkpoint(session=0x00007fb2d081a6e0, cfg=0x00007fff5c2d7300, waiting=true) + 271 at txn_ckpt.c:959
          frame #17: 0x0000000103a64d1e test_wt2909_checkpoint_integrity`__session_checkpoint(wt_session=0x00007fb2d081a6e0, config=0x0000000000000000) + 558 at session_api.c:1650
          frame #18: 0x00000001039294fe test_wt2909_checkpoint_integrity`subtest_populate(opts=0x00007fff5c2d7548, close_test=false) + 2334 at main.c:576
          frame #19: 0x00000001039284e5 test_wt2909_checkpoint_integrity`subtest_main(argc=9, argv=0x00007fff5c2d7ef0, close_test=false) + 1285 at main.c:490
          frame #20: 0x0000000103927e48 test_wt2909_checkpoint_integrity`main(argc=9, argv=0x00007fff5c2d7ef0) + 280 at main.c:629
          frame #21: 0x00007fff96ba35ad libdyld.dylib`start + 1
          frame #22: 0x00007fff96ba35ad libdyld.dylib`start + 1
      
        thread #2: tid = 0x147e51d, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d0817e40, cond=0x00007fb2d0404bf0, usecs=100000, run_func=0x0000000000000000, signalled=0x0000700000080e6f) + 397 at os_mtx_cond.c:87
          frame #3: 0x00000001039ae411 test_wt2909_checkpoint_integrity`__wt_cond_wait(session=0x00007fb2d0817e40, cond=0x00007fb2d0404bf0, usecs=100000, run_func=0x0000000000000000) + 49 at misc.i:19
          frame #4: 0x00000001039ad0e6 test_wt2909_checkpoint_integrity`__log_file_server(arg=0x00007fb2d0817e40) + 934 at conn_log.c:521
          frame #5: 0x00007fff8780c99d libsystem_pthread.dylib`_pthread_body + 131
          frame #6: 0x00007fff8780c91a libsystem_pthread.dylib`_pthread_start + 168
          frame #7: 0x00007fff8780a351 libsystem_pthread.dylib`thread_start + 13
      
        thread #3: tid = 0x147e51e, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d0818160, cond=0x00007fb2d0405f10, usecs=109000, run_func=0x0000000000000000, signalled=0x0000700000103e7f) + 397 at os_mtx_cond.c:87
          frame #3: 0x0000000103a6a4bc test_wt2909_checkpoint_integrity`__wt_cond_auto_wait_signal(session=0x00007fb2d0818160, cond=0x00007fb2d0405f10, progress=false, run_func=0x0000000000000000, signalled=0x0000700000103e7f) + 396 at cond_auto.c:62
          frame #4: 0x0000000103a6a588 test_wt2909_checkpoint_integrity`__wt_cond_auto_wait(session=0x00007fb2d0818160, cond=0x00007fb2d0405f10, progress=false, run_func=0x0000000000000000) + 56 at cond_auto.c:82
          frame #5: 0x00000001039ad2a4 test_wt2909_checkpoint_integrity`__log_wrlsn_server(arg=0x00007fb2d0818160) + 356 at conn_log.c:733
          frame #6: 0x00007fff8780c99d libsystem_pthread.dylib`_pthread_body + 131
          frame #7: 0x00007fff8780c91a libsystem_pthread.dylib`_pthread_start + 168
          frame #8: 0x00007fff8780a351 libsystem_pthread.dylib`thread_start + 13
      
        thread #4: tid = 0x147e51f, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d0818480, cond=0x00007fb2d0406480, usecs=1000000, run_func=0x0000000000000000, signalled=0x0000700000186e9e) + 397 at os_mtx_cond.c:87
          frame #3: 0x0000000103a6a4bc test_wt2909_checkpoint_integrity`__wt_cond_auto_wait_signal(session=0x00007fb2d0818480, cond=0x00007fb2d0406480, progress=false, run_func=0x0000000000000000, signalled=0x0000700000186e9e) + 396 at cond_auto.c:62
          frame #4: 0x00000001039ad51a test_wt2909_checkpoint_integrity`__log_server(arg=0x00007fb2d0818480) + 506 at conn_log.c:840
          frame #5: 0x00007fff8780c99d libsystem_pthread.dylib`_pthread_body + 131
          frame #6: 0x00007fff8780c91a libsystem_pthread.dylib`_pthread_start + 168
          frame #7: 0x00007fff8780a351 libsystem_pthread.dylib`thread_start + 13
      
        thread #5: tid = 0x147e520, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d0818ac0, cond=0x00007fb2d04050c0, usecs=1000000, run_func=0x0000000000000000, signalled=0x0000700000209e4f) + 397 at os_mtx_cond.c:87
          frame #3: 0x0000000103a6a4bc test_wt2909_checkpoint_integrity`__wt_cond_auto_wait_signal(session=0x00007fb2d0818ac0, cond=0x00007fb2d04050c0, progress=false, run_func=0x0000000000000000, signalled=0x0000700000209e4f) + 396 at cond_auto.c:62
          frame #4: 0x0000000103a6a588 test_wt2909_checkpoint_integrity`__wt_cond_auto_wait(session=0x00007fb2d0818ac0, cond=0x00007fb2d04050c0, progress=false, run_func=0x0000000000000000) + 56 at cond_auto.c:82
          frame #5: 0x00000001039e9e0f test_wt2909_checkpoint_integrity`__wt_evict_thread_run(session=0x00007fb2d0818ac0, thread=0x00007fb2d0406760) + 367 at evict_lru.c:316
          frame #6: 0x0000000103a79943 test_wt2909_checkpoint_integrity`__wt_thread_run(arg=0x00007fb2d0406760) + 67 at thread_group.c:25
          frame #7: 0x00007fff8780c99d libsystem_pthread.dylib`_pthread_body + 131
          frame #8: 0x00007fff8780c91a libsystem_pthread.dylib`_pthread_start + 168
          frame #9: 0x00007fff8780a351 libsystem_pthread.dylib`thread_start + 13
      
        thread #6: tid = 0x147e521, 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #0: 0x00007fff926dadb6 libsystem_kernel.dylib`__psynch_cvwait + 10
          frame #1: 0x00007fff8780d728 libsystem_pthread.dylib`_pthread_cond_wait + 767
          frame #2: 0x0000000103a2534d test_wt2909_checkpoint_integrity`__wt_cond_wait_signal(session=0x00007fb2d081a3c0, cond=0x00007fb2d040cc80, usecs=10000000, run_func=(test_wt2909_checkpoint_integrity`__sweep_server_run_chk at conn_sweep.c:254), signalled=0x000070000028ce7f) + 397 at os_mtx_cond.c:87
          frame #3: 0x00000001039b2881 test_wt2909_checkpoint_integrity`__wt_cond_wait(session=0x00007fb2d081a3c0, cond=0x00007fb2d040cc80, usecs=10000000, run_func=(test_wt2909_checkpoint_integrity`__sweep_server_run_chk at conn_sweep.c:254)) + 49 at misc.i:19
          frame #4: 0x00000001039b2418 test_wt2909_checkpoint_integrity`__sweep_server(arg=0x00007fb2d081a3c0) + 88 at conn_sweep.c:281
          frame #5: 0x00007fff8780c99d libsystem_pthread.dylib`_pthread_body + 131
          frame #6: 0x00007fff8780c91a libsystem_pthread.dylib`_pthread_start + 168
          frame #7: 0x00007fff8780a351 libsystem_pthread.dylib`thread_start + 13
      (lldb)
      

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: