Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8317

Never store overflow keys on internal pages

    • Type: Icon: Build Failure Build Failure
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.1, 5.2.0, 5.0.5
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 8
    • Storage - Ra 2021-11-01, Storage - Ra 2021-11-15

      This ticket describes the a new cache stuck failure happening in test/format. This follows on from the recent work performed in WT-7010. It is moved here, because the original error in WT-7010 has already been fixed. The recent error that caused WT-7010 to re-open again describes a different cache stuck issue.

      The cache stuck symptoms:

      [2021/08/07 13:43:24.436]     [1628343522:178826][29521:0x703706aef160], eviction-server: __evict_server, 452: Cache stuck for too long, giving up: Connection timed out
       [2021/08/07 13:43:24.436]     0x1002f9679e0:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
       [2021/08/07 13:43:24.436]     0x1002f9679e0:transaction state dump
       [2021/08/07 13:43:24.436]     0x1002f9679e0:current ID: 20942
       [2021/08/07 13:43:24.436]     0x1002f9679e0:last running ID: 20942
       [2021/08/07 13:43:24.436]     0x1002f9679e0:metadata_pinned ID: 20855
       [2021/08/07 13:43:24.436]     0x1002f9679e0:oldest ID: 20942
       [2021/08/07 13:43:24.436]     0x1002f9679e0:durable timestamp: (0, 0)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:oldest timestamp: (0, 0)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:pinned timestamp: (0, 0)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:stable timestamp: (0, 0)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:has_durable_timestamp: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:has_oldest_timestamp: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:has_pinned_timestamp: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:has_stable_timestamp: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:oldest_is_pinned: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:stable_is_pinned: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:checkpoint running: yes
       [2021/08/07 13:43:24.436]     0x1002f9679e0:checkpoint generation: 4
       [2021/08/07 13:43:24.436]     0x1002f9679e0:checkpoint pinned ID: 20764
       [2021/08/07 13:43:24.436]     0x1002f9679e0:checkpoint txn ID: 20855
       [2021/08/07 13:43:24.436]     0x1002f9679e0:session count: 43
       [2021/08/07 13:43:24.436]     0x1002f9679e0:Transaction state of active sessions:
       [2021/08/07 13:43:24.436]     0x1002f9679e0:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache dump
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache full: no
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache clean check: no (79.574%)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache dirty check: no (11.160%)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache updates check: yes (10.726%)
       [2021/08/07 13:43:24.436]     0x1002f9679e0:file:wt.wt(<live>):
       [2021/08/07 13:43:24.436]     0x1002f9679e0:internal: 11629 pages, 83MB, 10032/1597 clean/dirty pages, 72/10 clean/dirty MB, 0MB max page, 0MB max dirty page
       [2021/08/07 13:43:24.436]     0x1002f9679e0:leaf: 107005 pages, 87MB, 88923/18082 clean/dirty pages, 63/23/23 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
       [2021/08/07 13:43:24.436]     0x1002f9679e0:file:WiredTigerHS.wt(<live>):
       [2021/08/07 13:43:24.436]     0x1002f9679e0:internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
       [2021/08/07 13:43:24.436]     0x1002f9679e0:leaf: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page
       [2021/08/07 13:43:24.436]     0x1002f9679e0:file:WiredTiger.wt(<live>):
       [2021/08/07 13:43:24.436]     0x1002f9679e0:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
       [2021/08/07 13:43:24.436]     0x1002f9679e0:leaf: 0 pages
       [2021/08/07 13:43:24.436]     0x1002f9679e0:cache dump: total found: 184MB vs tracked inuse 170MB
       [2021/08/07 13:43:24.436]     0x1002f9679e0:total dirty bytes: 34MB vs tracked dirty 34MB
       [2021/08/07 13:43:24.436]     0x1002f9679e0:total updates bytes: 23MB vs tracked updates 24MB
       [2021/08/07 13:43:24.436]     [1628343522:242187][29521:0x703706aef160], eviction-server: __wt_evict_thread_run, 324: cache eviction thread error: Connection timed out
       [2021/08/07 13:43:24.436]     [1628343522:242212][29521:0x703706aef160], eviction-server: __wt_evict_thread_run, 324: the process must exit and restart: WT_PANIC: WiredTiger library panic
       [2021/08/07 13:43:24.436]     [1628343522:242222][29521:0x703706aef160], eviction-server: __wt_abort, 28: aborting WiredTiger library
      

      The symptoms of the cache failure here is shown through the backtrace:

      #0  __libc_pwrite64 (offset=<optimized out>, count=1024, buf=0x7fac848c9000, fd=8) at ../sysdeps/unix/sysv/linux/pwrite64.c:29
      #1  __libc_pwrite64 (fd=8, buf=buf@entry=0x7fac848c9000, count=count@entry=1024, offset=offset@entry=154221056) at ../sysdeps/unix/sysv/linux/pwrite64.c:27
      #2  0x00000000004ccf43 in __posix_file_write (file_handle=0x210e7c0, wt_session=0x209b5c0, offset=154221056, len=1024, buf=<optimized out>) at ../src/os_posix/os_fs.c:610
      #3  0x0000000000549235 in __wt_write (buf=0x7fac848c9000, len=1024, offset=<optimized out>, fh=<optimized out>, session=0x209b5c0) at ../src/include/os_fhandle_inline.h:182
      #4  __block_write_off (caller_locked=<optimized out>, checkpoint_io=<optimized out>, data_checksum=<optimized out>, checksump=0x7fac767fb404, sizep=0x7fac767fb40c, offsetp=0x7fac767fb410, 
          objectidp=<optimized out>, buf=<optimized out>, block=0xa0254a0, session=0x209b5c0) at ../src/block/block_write.c:327
      #5  __wt_block_write_off (session=0x209b5c0, block=0xa0254a0, buf=<optimized out>, objectidp=0x7fac767fb408, offsetp=0x7fac767fb410, sizep=0x7fac767fb40c, checksump=0x7fac767fb404, 
          data_checksum=false, checkpoint_io=true, caller_locked=false) at ../src/block/block_write.c:389
      #6  0x0000000000549b39 in __wt_block_write (session=<optimized out>, block=0xa0254a0, buf=<optimized out>, addr=0x7fac767fb590 ")", addr_sizep=0x7fac767fb580, data_checksum=<optimized out>, 
          checkpoint_io=true) at ../src/block/block_write.c:195
      #7  0x000000000045a249 in __wt_bt_write (session=session@entry=0x209b5c0, buf=buf@entry=0x7fac8433e9e0, addr=addr@entry=0x7fac767fb590 ")", addr_sizep=addr_sizep@entry=0x7fac767fb580, 
          compressed_sizep=compressed_sizep@entry=0x7fac767fb588, checkpoint=checkpoint@entry=false, checkpoint_io=<optimized out>, compressed=<optimized out>) at ../src/btree/bt_io.c:351
      #8  0x00000000004e1417 in __rec_split_write (session=0x209b5c0, r=0x7fac8433e880, chunk=<optimized out>, last_block=<optimized out>, compressed_image=0x0) at ../src/reconcile/rec_write.c:1863
      #9  0x0000000000636019 in __wt_rec_row_leaf (session=session@entry=0x209b5c0, r=r@entry=0x7fac8433e880, pageref=pageref@entry=0x7fac6006b5e0, salvage=salvage@entry=0x0)
          at ../src/reconcile/rec_row.c:1069
      #10 0x00000000004e4bb2 in __reconcile (page_lockedp=<synthetic pointer>, flags=68, salvage=0x0, ref=0x7fac6006b5e0, session=0x209b5c0) at ../src/reconcile/rec_write.c:194
      #11 __wt_reconcile (session=session@entry=0x209b5c0, ref=ref@entry=0x7fac6006b5e0, salvage=salvage@entry=0x0, flags=flags@entry=68) at ../src/reconcile/rec_write.c:99
      #12 0x00000000005b4cc0 in __wt_sync_file (session=session@entry=0x209b5c0, syncop=syncop@entry=WT_SYNC_CHECKPOINT) at ../src/btree/bt_sync.c:657
      #13 0x0000000000526ca1 in __checkpoint_tree (session=0x209b5c0, is_checkpoint=<optimized out>, cfg=<optimized out>) at ../src/txn/txn_ckpt.c:1878
      #14 0x0000000000527bf5 in __checkpoint_tree_helper (cfg=0x7fac767fbe20, session=0x209b5c0) at ../src/txn/txn_ckpt.c:1994
      #15 __checkpoint_apply_to_dhandles (op=<optimized out>, cfg=<optimized out>, session=<optimized out>) at ../src/txn/txn_ckpt.c:198
      #16 __txn_checkpoint (session=0x209b5c0, cfg=0x7fac767fbe20) at ../src/txn/txn_ckpt.c:920
      #17 0x000000000052969c in __txn_checkpoint_wrapper (cfg=0x7fac767fbe20, session=0x209b5c0) at ../src/txn/txn_ckpt.c:1163
      #18 __wt_txn_checkpoint (session=session@entry=0x209b5c0, cfg=cfg@entry=0x7fac767fbe20, waiting=waiting@entry=true) at ../src/txn/txn_ckpt.c:1217
      #19 0x000000000041b333 in __session_checkpoint (wt_session=0x209b5c0, config=0x64ccea "drop=(all)") at ../src/session/session_api.c:1878
      #20 0x0000000000439ba1 in checkpoint (arg=<optimized out>) at ../../../test/format/checkpoint.c:121
      #21 0x00007faca6dd2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #22 0x00007faca6cd0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Aborting Thread:

      Thread 1 (Thread 0x7faca6052700 (LWP 74586)):
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x00007faca6bd3859 in __GI_abort () at abort.c:79
      #2  0x0000000000413c60 in __wt_abort (session=session@entry=0x2095f20) at ../src/os_common/os_abort.c:30
      #3  0x000000000041c6d6 in __wt_panic_func (session=session@entry=0x2095f20, error=error@entry=110, func=func@entry=0x660cb0 <__PRETTY_FUNCTION__.46687> "__wt_evict_thread_run", line=line@entry=324, fmt=fmt@entry=0x66059f "cache eviction thread error") at ../src/support/err.c:387
      #4  0x000000000040ed04 in __wt_evict_thread_run (session=0x2095f20, thread=0x20c9550) at ../src/evict/evict_lru.c:324
      #5  0x0000000000518eb1 in __thread_run (arg=0x20c9550) at ../src/support/thread_group.c:31
      #6  0x00007faca6dd2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #7  0x00007faca6cd0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

        1. image-2021-11-05-16-24-11-845.png
          174 kB
          Jie Chen
        2. image-2021-11-05-22-18-07-836.png
          195 kB
          Jie Chen

            Assignee:
            jie.chen@mongodb.com Jie Chen
            Reporter:
            jie.chen@mongodb.com Jie Chen
            Alexey Anisimov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: