Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3214

workload stuck due to eviction of dirty data

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: WT2.9.1
    • Component/s: None
    • None

      Hi!

      We have a very simple workload that sequentially scans one table and writes slightly modified data to another. (It's upgrade due to small change of table key encoding).
      The table has 2 indexes with the following sizes (no compression):

       $ ls -lh db/persistence/sor_se/si_db/wt/custom_audit*
      -rw-r--r-- 1 sbn tbeng 620K Mar  9 10:44 db/persistence/sor_se/si_db/wt/custom_audit_idx-175.wti
      -rw-r--r-- 1 sbn tbeng 348K Mar  9 10:44 db/persistence/sor_se/si_db/wt/custom_audit_idx-65.wti
      -rw-r--r-- 1 sbn tbeng  57M Mar  9 10:44 db/persistence/sor_se/si_db/wt/custom_audit.wt
      

      I.e. total size is around 58M
      Cache size is 128M

      In WiredTiger 2.8 this upgrade took 25 sec, in 2.9.1 it seems to stuck forever with the stacks like:

      application thread:

      calloc 
      __wt_calloc 
      __wt_row_insert_alloc 
      __wt_row_modify 
      __split_multi_inmem 
      __wt_split_rewrite 
      __evict_page_dirty_update 
      __wt_evict 
      __evict_page 
      __wt_cache_eviction_worker 
      __wt_cache_eviction_check 
      __wt_txn_begin 
      __session_begin_transaction
      ...
      

      eviction thread:

      __wt_row_insert_alloc 
      __wt_row_modify 
      __split_multi_inmem 
      __wt_split_rewrite 
      __evict_page_dirty_update 
      __wt_evict 
      __evict_page 
      __evict_lru_pages 
      __evict_pass 
      __evict_server 
      __wt_evict_thread_run 
      __wt_thread_run 
      start_thread 
      clone 
      

      And constantly eating around 100-150% CPU (as reported by Linux top)

      I'm sure that this is caused by change of eviction settings for dirty data made in 2.9.0
      If I return them back to the values they were in 2.8.0 workload finishes in 30 sec.
      So some performance degradation for such write-heave workload is expected (WT-3089).
      But is complete lock-up expected?

      Thanks!

        1. stats.tgz
          21.83 MB
          Dmitri Shubin

            Assignee:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Reporter:
            Dmitri Shubin Dmitri Shubin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: