Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2891

Test format can hang indefinitely when cache's dirty threshold is exceeded

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      This came up in Jenkins with the following runs being "stuck" and running forever without terminating due to a stuck cache.

      This reproduces readily on Linux for me.

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      backups=0
      bitcnt=3
      bloom=1
      bloom_bit_count=45
      bloom_hash_count=8
      bloom_oldest=0
      cache=55
      checkpoints=1
      checksum=uncompressed
      chunk_size=1
      compaction=0
      compression=snappy
      data_extend=0
      data_source=table
      delete_pct=5
      dictionary=0
      direct_io=0
      encryption=none
      evict_max=4
      file_type=variable-length column-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      in_memory=0
      insert_pct=31
      internal_key_truncation=1
      internal_page_max=14
      isolation=snapshot
      key_gap=9
      key_max=38
      key_min=26
      leaf_page_max=9
      leak_memory=0
      logging=0
      logging_archive=0
      logging_compression=none
      logging_prealloc=0
      long_running_txn=0
      lsm_worker_threads=4
      merge_max=9
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=1
      quiet=1
      repeat_data_pct=47
      reverse=0
      rows=100000
      runs=100
      rebalance=1
      salvage=1
      split_pct=57
      statistics=1
      statistics_server=0
      threads=18
      timer=20
      transaction-frequency=56
      value_max=1206
      value_min=6
      verify=1
      wiredtiger_config=
      write_pct=83
      ############################################
      

      In diving I found that the system here hit a horrible kind of loop within eviction,
      where the evict server would constantly loop through and find pages to evict, however it would only ever evict clean pages and we would see cache->bytes_dirty_leaf still full after each lap. The value of cache->bytes_dirty_leaf would be above the dirty threshold with overhead.

      Math:

      Variable Value
      cache->bytes_dirty_leaf 10679988
      conn->cache_size + 1 57671681
      (cache->eviction_dirty_trigger * conn->cache_size)/100 11534336
      cache->overhead_pct 8
      dirty_inuse 11534387 (cache->bytes_dirty_leaf * 1.08)

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            david.hows David Hows
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: