Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1910

Deadlock between LSM drop and application checkpoint

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.6.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Jenkins tests have hung running test/format with an LSM workload. The hang reproduces for me after ~10 minutes.

      The test/format configuration file is:

      ############################################                                    
      #  RUN PARAMETERS                                                               
      ############################################                                    
      abort=0                                                                         
      auto_throttle=1                                                                 
      firstfit=1                                                                      
      bitcnt=3                                                                        
      bloom=1                                                                         
      bloom_bit_count=23                                                              
      bloom_hash_count=26                                                             
      bloom_oldest=0                                                                  
      cache=60                                                                        
      checkpoints=1                                                                   
      checksum=off                                                                    
      chunk_size=2                                                                    
      compaction=0                                                                    
      compression=bzip                                                                
      data_extend=0                                                                   
      data_source=lsm                                                                 
      delete_pct=37                                                                   
      dictionary=0                                                                    
      evict_max=5                                                                     
      file_type=row-store                                                             
      backups=0                                                                       
      huffman_key=0                                                                   
      huffman_value=0                                                                 
      insert_pct=50                                                                   
      internal_key_truncation=1                                                       
      internal_page_max=13                                                            
      isolation=random                                                                
      key_gap=19                                                                      
      key_max=26                                                                      
      key_min=14                                                                      
      leak_memory=0                                                                   
      leaf_page_max=17                                                                
      logging=1                                                                       
      logging_archive=1                                                               
      logging_prealloc=1                                                              
      lsm_worker_threads=3                                                            
      merge_max=5                                                                     
      mmap=1                                                                          
      ops=100000                                                                      
      prefix_compression=1                                                            
      prefix_compression_min=7                                                        
      repeat_data_pct=80                                                              
      reverse=0                                                                       
      rows=100000                                                                     
      runs=100                                                                        
      split_pct=47                                                                    
      statistics=0                                                                    
      statistics_server=0                                                             
      threads=27                                                                      
      timer=20                                                                        
      value_max=2374                                                                  
      value_min=13                                                                    
      wiredtiger_config=                                                              
      write_pct=24                                                                    
      ############################################
      

      There are interesting stack traces:

      Thread 17 (Thread 0x7f6c47fff700 (LWP 4190)):
      #0  0x00000037cdaec5f3 in select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x00000000004474f0 in __wt_sleep (seconds=0, micro_seconds=10)
          at ../src/os_posix/os_sleep.c:23
      #2  0x0000000000445cf7 in __wt_readlock (session=0x1e0ee40, 
          rwlock=0x7f6c581d87a0) at ../src/os_posix/os_mtx_rw.c:120
      #3  0x000000000046bc65 in __wt_session_lock_dhandle (session=0x1e0ee40, 
          flags=0, deadp=0x7f6c47ffeaa0) at ../src/session/session_dhandle.c:91
      #4  0x000000000046c7d5 in __wt_session_get_btree (session=0x1e0ee40, 
          uri=0x7f6c58222f10 "file:wt-000048.lsm", checkpoint=0x0, cfg=0x0, flags=0)
          at ../src/session/session_dhandle.c:370
      #5  0x00000000004b29db in __conn_btree_apply_internal (session=0x1e0ee40, 
          dhandle=0x7f6c581db3f0, func=0x476d94 <__wt_checkpoint_list>, 
          cfg=0x7f6c47ffed80) at ../src/conn/conn_dhandle.c:534
      #6  0x00000000004b2d1b in __wt_conn_btree_apply (session=0x1e0ee40, 
          apply_checkpoints=0, uri=0x0, func=0x476d94 <__wt_checkpoint_list>, 
          cfg=0x7f6c47ffed80) at ../src/conn/conn_dhandle.c:588
      #7  0x0000000000476b08 in __checkpoint_apply_all (session=0x1e0ee40, 
          cfg=0x7f6c47ffed80, op=0x476d94 <__wt_checkpoint_list>, fullp=0x0)
          at ../src/txn/txn_ckpt.c:158
      #8  0x0000000000477835 in __wt_txn_checkpoint (session=0x1e0ee40, 
          cfg=0x7f6c47ffed80) at ../src/txn/txn_ckpt.c:377
      #9  0x000000000046a9b4 in __session_checkpoint (wt_session=0x1e0ee40, 
          config=0x0) at ../src/session/session_api.c:919
      #10 0x0000000000411f53 in ops (arg=0x208fff8) at ../../../test/format/ops.c:351
      

      and

      Thread 30 (Thread 0x7f6c6641b700 (LWP 4141)):
      #0  __lll_lock_wait ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
      #1  0x00000037ce20a11b in _L_lock_812 () from /lib64/libpthread.so.0
      #2  0x00000037ce209fe8 in __GI___pthread_mutex_lock (mutex=0x1dfe900)
          at ../nptl/pthread_mutex_lock.c:79
      #3  0x000000000046ba1c in __wt_spin_lock (session=0x1e08300, t=0x1dfe900)
          at ../src/include/mutex.i:159
      #4  0x000000000046be30 in __wt_session_release_btree (session=0x1e08300)
          at ../src/session/session_dhandle.c:148
      #5  0x00000000004db7fb in __lsm_discard_handle (session=0x1e08300, 
          uri=0x7f6c58005bb0 "file:wt-000048.lsm", checkpoint=0x0)
          at ../src/lsm/lsm_work_unit.c:461
      #6  0x00000000004dad55 in __wt_lsm_checkpoint_chunk (session=0x1e08300, 
          lsm_tree=0x2020b60, chunk=0x7f6c5822ddc0) at ../src/lsm/lsm_work_unit.c:268
      #7  0x0000000000440342 in __lsm_worker_general_op (session=0x1e08300, 
          cookie=0x1e02770, completed=0x7f6c6641aee0) at ../src/lsm/lsm_worker.c:65
      #8  0x00000000004404d5 in __lsm_worker (arg=0x1e02770)
          at ../src/lsm/lsm_worker.c:122
      #9  0x00000037ce207ee5 in start_thread (arg=0x7f6c6641b700)
          at pthread_create.c:309
      #10 0x00000037cdaf4d1d in clone ()
          at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      

      and

      Thread 29 (Thread 0x7f6c65419700 (LWP 4142)):
      #0  __lll_lock_wait ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
      #1  0x00000037ce20a11b in _L_lock_812 () from /lib64/libpthread.so.0
      #2  0x00000037ce209fe8 in __GI___pthread_mutex_lock (mutex=0x1dfe9c0)
          at ../nptl/pthread_mutex_lock.c:79
      #3  0x0000000000466329 in __wt_spin_lock (session=0x1e085c0, t=0x1dfe9c0)
          at ../src/include/mutex.i:159
      #4  0x0000000000467ceb in __session_create (wt_session=0x1e085c0, 
          uri=0x7f6c5c365180 "file:wt-000050.bf", 
          config=0x7f6c5c34ca10 ",key_format=r,value_format=1t,exclusive=true")
          at ../src/session/session_api.c:434
      #5  0x00000000004edcd9 in __wt_bloom_finalize (bloom=0x7f6c5c0ff5b0)
          at ../src/bloom/bloom.c:212
      #6  0x00000000004d8117 in __wt_lsm_merge (session=0x1e085c0, 
          lsm_tree=0x2020b60, id=2) at ../src/lsm/lsm_merge.c:401
      #7  0x00000000004405b1 in __lsm_worker (arg=0x1e02798)
          at ../src/lsm/lsm_worker.c:138
      #8  0x00000037ce207ee5 in start_thread (arg=0x7f6c65419700)
          at pthread_create.c:309
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: