Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8356

Fix cache stuck with dirty content during checkpoint at conn_close.

    • Type: Icon: Build Failure Build Failure
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage - Ra 2021-11-29, Storage - Ra 2021-12-13, Storage - Ra 2022-01-10, Storage - Ra 2022-01-24, Storage - Ra 2022-03-07

      This is most likely caused by overflow items in table #9 (T0009.wt). See comments for details.

      Evergreen test failure here:

      https://evergreen.mongodb.com/task/wiredtiger_rhel80_linux_directio_2d685945fee4792c94c7b855cc449fc2656eda0d_21_11_05_05_01_42/0

      This is possibly related to WT-8054.  In both cases the only activity (other than internal threads) is a checkpoint. In this case checkpoint was called during connection close. 

      Unlike WT-8054, here are below the clean eviction threshold, but have some dirty data to evict. 

      Here's the full transaction/cache state dump:

      transaction state dump
      current ID: 270551
      last running ID: 270551
      metadata_pinned ID: 270549
      oldest ID: 270551
      durable timestamp: (0, 1)
      oldest timestamp: (0, 1)
      pinned timestamp: (0, 1)
      stable timestamp: (0, 1)
      has_durable_timestamp: yes
      has_oldest_timestamp: yes
      has_pinned_timestamp: yes
      has_stable_timestamp: yes
      oldest_is_pinned: yes
      stable_is_pinned: yes
      checkpoint running: yes
      checkpoint generation: 7
      checkpoint pinned ID: 270549
      checkpoint txn ID: 270549
      session count: 10
      Transaction state of active sessions:
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      cache dump
      cache full: no
      cache clean check: no (30.344%)
      cache dirty check: yes (2.523%)
      cache updates check: no (2.443%)
      file:T00003.wt(<live>):
      internal: 4 pages, 0MB, 4/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 481 pages, 29MB, 481/0 clean/dirty pages, 29/0/2 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00001.wt(<live>):
      internal: 198 pages, 2MB, 198/0 clean/dirty pages, 2/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 6995 pages, 29MB, 6995/0 clean/dirty pages, 29/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00004.wt(<live>):
      internal: 1 pages, 1MB, 1/0 clean/dirty pages, 1/0 clean/dirty MB, 1MB max page, 0MB max dirty page
      leaf: 1500 pages, 44MB, 1500/0 clean/dirty pages, 44/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:F00006(<live>):
      internal: 590 pages, 3MB, 590/0 clean/dirty pages, 3/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 9690 pages, 35MB, 9690/0 clean/dirty pages, 35/0/1 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00005.wt(<live>):
      internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 503 pages, 13MB, 503/0 clean/dirty pages, 13/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00002.wt(<live>):
      internal: 6 pages, 0MB, 6/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 600 pages, 43MB, 600/0 clean/dirty pages, 43/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00010.wt(<live>):
      internal: 319 pages, 10MB, 319/0 clean/dirty pages, 10/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 44748 pages, 96MB, 44748/0 clean/dirty pages, 96/0/2 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00009.wt(<live>):
      internal: 3950 pages, 14MB, 3945/5 clean/dirty pages, 14/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 38805 pages, 122MB, 26649/12156 clean/dirty pages, 79/42/34 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:F00008(<live>):
      internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 1661 pages, 22MB, 1579/82 clean/dirty pages, 20/1/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:T00007.wt(<live>):
      internal: 16 pages, 6MB, 16/0 clean/dirty pages, 6/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 25719 pages, 51MB, 25638/81 clean/dirty pages, 51/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:WiredTigerHS.wt(<live>):
      internal: 1 pages, 2MB, 1/0 clean/dirty pages, 2/0 clean/dirty MB, 2MB max page, 0MB max dirty page
      leaf: 157 pages, 4MB, 157/0 clean/dirty pages, 4/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      file:WiredTiger.wt(<live>):
      internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
      cache dump: total found: 577MB vs tracked inuse 534MB
      total dirty bytes: 44MB vs tracked dirty 44MB
      total updates bytes: 43MB vs tracked updates 46MB

       And here's the stack trace of the checkpointing thread:

      Thread 5 (Thread 0x7f97d1828740 (LWP 106972)):
      #0  0x00007f97d0df9467 in pread64 () from /lib64/libpthread.so.0
      #1  0x00000000004d0a13 in __posix_file_read (file_handle=0x7f97b800ab50, wt_session=0x96d180, 
          offset=61703168, len=512, buf=<optimized out>) at ../src/os_posix/os_fs.c:425
      #2  0x000000000054c1ff in __wt_read (buf=<optimized out>, len=512, offset=61703168, 
          fh=<optimized out>, session=0x96d180) at ../src/include/os_fhandle_inline.h:106
      #3  __wt_block_read_off (session=0x96d180, block=0x7f97b8008200, buf=0xabf1d0, objectid=0, 
          offset=61703168, size=512, checksum=3990653443) at ../src/block/block_read.c:263
      #4  0x000000000054ca44 in __wt_bm_read (bm=0x7f97b800b830, session=0x96d180, buf=0xabf1d0, 
          addr=<optimized out>, addr_size=<optimized out>) at ../src/block/block_read.c:104
      #5  0x000000000045b92c in __wt_bt_read (session=session@entry=0x96d180, buf=buf@entry=0xabf1d0, 
          addr=0xcbb26c "\343\001\266\201\201\344\355\334i\303?0000201779/LMNOA0000201780.00/op\200\302\026\060\060\060\060\062\060\061\067\070\060/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWX"..., 
          addr_size=10) at ../src/btree/bt_io.c:43
      #6  0x000000000045cccf in __ovfl_read (store=0xabf1d0, addr_size=<optimized out>, 
          addr=<optimized out>, session=0x96d180) at ../src/btree/bt_ovfl.c:57
      #7  __wt_ovfl_read (session=session@entry=0x96d180, page=page@entry=0x0, 
          unpack=unpack@entry=0x7ffd3b79cb80, store=store@entry=0xabf1d0, 
          decoded=decoded@entry=0x7ffd3b79cb20) at ../src/btree/bt_ovfl.c:57
      #8  0x000000000046348a in __cell_data_ref (page=0x0, store=0xabf1d0, unpack=0x7ffd3b79cb80, 
          page_type=<optimized out>, session=0x96d180) at ../src/include/cell_inline.h:1122
      #9  __wt_dsk_cell_data_ref (store=0xabf1d0, unpack_arg=0x7ffd3b79cb80, page_type=<optimized out>, 
          session=0x96d180) at ../src/include/cell_inline.h:1159
      #10 __verify_dsk_row_leaf (addr=0x0, dsk=0xcbb200, tag=0x66418c "[write-check]", session=0x96d180)
          at ../src/btree/bt_vrfy_dsk.c:554
      #11 __wt_verify_dsk_image (session=0x96d180, tag=0x66418c "[write-check]", dsk=0xcbb200, 
          size=<optimized out>, addr=0x0, empty_page_ok=<optimized out>)
          at ../src/btree/bt_vrfy_dsk.c:178
      #12 0x000000000045be16 in __wt_bt_write (session=session@entry=0x96d180, buf=buf@entry=0x9fa6f0, 
          addr=addr@entry=0x7ffd3b79cd60 "\300o\370\224\227\177", 
          addr_sizep=addr_sizep@entry=0x7ffd3b79cd50, 
          compressed_sizep=compressed_sizep@entry=0x7ffd3b79cd58, checkpoint=checkpoint@entry=false, 
          checkpoint_io=true, compressed=false) at ../src/btree/bt_io.c:223
      #13 0x00000000004e4d37 in __rec_split_write (session=0x96d180, r=0x9fa590, chunk=<optimized out>, 
          last_block=<optimized out>, compressed_image=0x0) at ../src/reconcile/rec_write.c:1867
      #14 0x00000000004ea4a3 in __rec_split_finish_process_prev (r=0x9fa590, session=0x96d180)
          at ../src/reconcile/rec_write.c:1377
      #15 __wt_rec_split_finish (session=0x96d180, r=0x9fa590) at ../src/reconcile/rec_write.c:1407
      #16 0x0000000000640959 in __wt_rec_row_leaf (session=session@entry=0x96d180, r=r@entry=0x9fa590, 
          pageref=pageref@entry=0x7f97ad01cc10, salvage=salvage@entry=0x0)
          at ../src/reconcile/rec_row.c:1074
      #17 0x00000000004e89f9 in __reconcile (page_lockedp=<synthetic pointer>, flags=68, salvage=0x0, 
          ref=0x7f97ad01cc10, session=0x96d180) at ../src/reconcile/rec_write.c:195
      #18 __wt_reconcile (session=session@entry=0x96d180, ref=ref@entry=0x7f97ad01cc10, 
          salvage=salvage@entry=0x0, flags=flags@entry=68) at ../src/reconcile/rec_write.c:97
      #19 0x00000000005bdbe2 in __wt_sync_file (session=session@entry=0x96d180, 
          syncop=syncop@entry=WT_SYNC_CHECKPOINT) at ../src/btree/bt_sync.c:661
      #20 0x000000000052afe1 in __checkpoint_tree (session=0x96d180, is_checkpoint=<optimized out>, 
          cfg=<optimized out>) at ../src/txn/txn_ckpt.c:1880
      #21 0x000000000052bf35 in __checkpoint_tree_helper (cfg=0x7ffd3b79d640, session=0x96d180)
          at ../src/txn/txn_ckpt.c:1996
      #22 __checkpoint_apply_to_dhandles (op=<optimized out>, cfg=<optimized out>, 
          session=<optimized out>) at ../src/txn/txn_ckpt.c:198
      #23 __txn_checkpoint (session=0x96d180, cfg=0x7ffd3b79d640) at ../src/txn/txn_ckpt.c:922
      #24 0x000000000052d9dc in __txn_checkpoint_wrapper (cfg=0x7ffd3b79d640, session=0x96d180)
          at ../src/txn/txn_ckpt.c:1165
      #25 __wt_txn_checkpoint (session=0x96d180, cfg=cfg@entry=0x7ffd3b79d640, 
          waiting=waiting@entry=true) at ../src/txn/txn_ckpt.c:1219
      #26 0x0000000000525838 in __wt_txn_global_shutdown (session=session@entry=0x96c9a0, 
          cfg=cfg@entry=0x7ffd3b79d6e0) at ../src/txn/txn.c:2458
      #27 0x00000000004696fe in __conn_close (wt_conn=0x9ec000, config=<optimized out>)
          at ../src/conn/conn_api.c:1175
      #28 0x000000000044d0ad in wts_close (connp=connp@entry=0x8e4e20 <g>, 
          sessionp=sessionp@entry=0x8e4e30 <g+16>) at ../../../test/format/wts.c:525
      #29 0x0000000000437488 in main (argc=<optimized out>, argv=<optimized out>)
          at ../../../test/format/t.c:342
      

        1. CONFIG_WT8356
          9 kB
        2. image-2022-01-11-18-52-55-374.png
          image-2022-01-11-18-52-55-374.png
          616 kB
        3. per_file_stats_dirty_bytes.png
          per_file_stats_dirty_bytes.png
          214 kB
        4. per_file_stats.png
          per_file_stats.png
          202 kB
        5. RUNDIR.2_evict_all.log
          7 kB
        6. stats_evict_all.tar.gz
          5.10 MB
        7. stats_per_file.tar.gz
          38.43 MB
        8. stats.tar.gz
          4.43 MB

            Assignee:
            alexey.anisimov@mongodb.com Alexey Anisimov (Inactive)
            Reporter:
            xgen-evg-user Xgen-Evergreen-User
            Etienne Petrel, Jie Chen, Sulabh Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: