Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-573

Unable to proceed due to cache full with LSM test/format

    • Type: Icon: Task Task
    • Resolution: Done
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      Running test/format with the following configuration:

      ############################################
      #  RUN PARAMETERS
      ############################################
      # bitcnt not applicable to this run
      cache=94                
      compression=bzip
      data_extend=0   
      data_source=lsm
      delete_pct=14
      dictionary=0
      file_type=row-store
      hot_backups=0   
      huffman_key=0           
      huffman_value=0 
      insert_pct=40           
      internal_key_truncation=0
      internal_page_max=14
      key_gap=4
      key_max=102     
      key_min=27
      leaf_page_max=21
      ops=382656
      prefix=1
      repeat_data_pct=37 
      reverse=0               
      rows=600067                     
      runs=0                          
      split_pct=65                    
      threads=10                          
      value_max=2186                  
      value_min=3             
      # wiredtiger_config not applicable to this run
      write_pct=5             
      ############################################
      

      The application ends up stuck (it's not making any progress at all. All application threads have the call stack:

      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
      WT-1  0x0000000000423c5f in __wt_cond_wait (session=0x8c4b40, cond=0x8cb8a0, 
          usecs=10000) at ../src/os_posix/os_mtx.c:75
      WT-2  0x000000000044488a in __wt_cache_full_check (session=0x8c4b40, onepass=0)
          at ../src/include/cache.i:87
      WT-3  0x000000000044498b in __wt_page_in_func (session=0x8c4b40, 
          parent=0x7fffe8b9b550, ref=0x7fffe8b9bad0, 
          file=0x66beb6 "../src/btree/row_srch.c", line=201)
          at ../src/btree/bt_page.c:47
      WT-4  0x00000000004a3c3a in __wt_page_swap_func (session=0x8c4b40, 
          out=0x7fffe8b9b550, in=0x7fffe8b9b550, inref=0x7fffe8b9bad0, 
          file=0x66beb6 "../src/btree/row_srch.c", line=201)
          at ../src/include/btree.i:489
      

      The eviction server is looping as expected, populating the eviction queue. However the WT_EVICT_NO_PROGRESS flag is never being cleared, so no pages are being successfully evicted.

      The WT_EVICT_STUCK flag is set, but the clause at bt_evict.c:__evict_get_page:961 that aborts transactions is never firing. I wonder if the __wt_txn_oldest check isn't working as expected?

      We should figure out how to make progress. I suspect that all pages have open hazard references. I'll need to look more carefully at the state of the cache.

            Assignee:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: