Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2766

Don't count eviction of lookaside file pages for the purpose of checking stuck cache

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.0, 3.2.10, 3.3.12
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      This issue was noticed while working on WT-2731, with the following test/format configuration:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      backups=0
      bitcnt=4
      bloom=1
      bloom_bit_count=4
      bloom_hash_count=30
      bloom_oldest=0
      cache=5
      checkpoints=1
      checksum=uncompressed
      chunk_size=4
      compaction=0
      compression=zlib
      data_extend=0
      data_source=file
      delete_pct=35
      dictionary=0
      direct_io=0
      encryption=rotn-7
      evict_max=0
      file_type=row-store
      firstfit=1
      huffman_key=0
      huffman_value=0
      in_memory=0
      insert_pct=24
      internal_key_truncation=1
      internal_page_max=11
      isolation=snapshot
      key_gap=9
      key_max=51
      key_min=18
      leaf_page_max=17
      leak_memory=0
      logging=0
      logging_archive=0
      logging_compression=none
      logging_prealloc=0
      long_running_txn=0
      lsm_worker_threads=3
      merge_max=16
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=6
      quiet=1
      repeat_data_pct=66
      reverse=0
      rows=100000
      runs=1
      rebalance=1
      salvage=1
      split_pct=69
      statistics=1
      statistics_server=0
      threads=4
      timer=20
      transaction-frequency=87
      value_max=3294
      value_min=8
      verify=1
      wiredtiger_config=
      write_pct=71
      ############################################
      

      The cache dump looks like:

      ==========
      cache dump
      file:wt(<live>): 
      	internal pages: 1 pages, 1545 max, 0MB total
      	leaf pages: 4 pages, 1392140 max, 4MB total
      	dirty pages: 1 pages, 1545 max, 0MB total
      file:WiredTigerLAS.wt(<live>): 
      	internal pages: 1 pages, 249 max, 0MB total
      	leaf pages: 1 pages, 412 max, 0MB total
      	dirty pages: 1 pages, 249 max, 0MB total
      file:WiredTiger.wt(<live>): 
      	internal pages: 1 pages, 249 max, 0MB total
      	dirty pages: 1 pages, 249 max, 0MB total
      cache dump: total found = 5MB vs tracked inuse 5MB
      ==========
      

      There are 4 clean leaf pages, and 4 threads running snapshot isolation transactions, each pinning a single page. In this case, I'd expect the cache stuck check to fire, but it's not. After some time in a debugger, it appears as though there is some eviction activity happening via the lookaside file:

      (gdb) where
      #0  __wt_cache_page_evict (session=0x632000001500, page=0x6080000d2020)
          at ../src/include/btree.i:302
      #1  0x0000000000ac1d1e in __wt_page_out (session=0x632000001500, pagep=0x60400000d790)
          at ../src/btree/bt_discard.c:104
      #2  0x0000000000ac0d93 in __wt_ref_out (session=0x632000001500, ref=0x60400000d790)
          at ../src/btree/bt_discard.c:33
      #3  0x0000000000651ae5 in __evict_page_clean_update (session=0x632000001500,
          ref=0x60400000d790, closing=false) at ../src/evict/evict_page.c:224
      #4  0x000000000064cfe0 in __wt_evict (session=<optimized out>, ref=<optimized out>,
          closing=<optimized out>) at ../src/evict/evict_page.c:121
      #5  0x0000000000628dd6 in __evict_page (session=0x632000001500, is_server=true)
          at ../src/evict/evict_lru.c:1665
      #6  0x0000000000639baa in __evict_lru_pages (session=0x632000001500, is_server=true)
          at ../src/evict/evict_lru.c:916
      #7  0x000000000063b93b in __evict_pass (session=0x632000001500) at ../src/evict/evict_lru.c:677
      #8  0x00000000006368ab in __evict_server (session=0x632000001500, did_work=0x7f68c5ffee30)
          at ../src/evict/evict_lru.c:271
      #9  0x000000000061c5f9 in __evict_thread_run (arg=0x632000001500)
          at ../src/evict/evict_lru.c:207
      #10 0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0
      #11 0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6
      (gdb) p page->memory_footprint
      $43 = 412
      (gdb) p page->dsk
      $44 = (const WT_PAGE_HEADER *) 0x6120002824c0
      (gdb) p *$44
      $45 = {recno = 0, write_gen = 123, mem_size = 316, u = {entries = 8, datalen = 8},
        type = 7 '\a', flags = 12 '\f', unused = "\000"}
      (gdb) p session->dhandle->name
      $46 = 0x60300000d750 "file:WiredTigerLAS.wt"
      

      It is specifically the __wt_las_sweep function that is triggering cache activity:

      (gdb) where
      #0  __wt_las_sweep (session=0x632000001840) at ../src/cache/cache_las.c:289
      #1  0x00000000005bf76d in __sweep_server (arg=0x632000001840) at ../src/conn/conn_sweep.c:283
      #2  0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0
      #3  0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6
      

      We should stop counting eviction of lookaside file pages as relevant to the cache->evict_page count, so that the diagnostic stuck cache check will fire as expected.

            Assignee:
            david.hows David Hows
            Reporter:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: