Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4690

Make sure eviction does not split during checkpoints

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT3.2.0, 4.1.11, 4.0.11
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 8
    • Storage Engines 2019-05-06
    • v4.0

      In a local run of test/format, I encountered a failure with format run more than 15 minutes past the maximum time. Interestingly the cache isn't full:

      0x7f1dc320b010:cache clean check: no (7.466%)
      0x7f1dc320b010:cache dirty check: no (2.422%)
      

      The call stacks show that all user threads are here:

      #2  __wt_spin_backoff (yield_count=0x7f1daa7e16b0, sleep_usecs=0x7f1daa7e16a8)
          at ../src/include/misc.i:245
      #3  __wt_page_in_func (session=0x7f1dc3215c58, ref=0x7f1d5081d2f0, flags=3746, 
          func=0x55aeac0a8310 <__func__.35452> "__tree_walk_internal", line=527) at ../src/btree/bt_read.c:864
      #4  __wt_page_swap_func (session=0x7f1dc3215c58, held=0x0, want=0x7f1d5081d2f0, flags=3746, 
          func=0x55aeac0a8310 <__func__.35452> "__tree_walk_internal", line=527) at ../src/include/btree.i:1702
      #5  __tree_walk_internal (session=0x7f1dc3215c58, refp=0x7f1d94146238, walkcntp=0x0, 
          skip_func=0x0, func_cookie=0x0, flags=2690) at ../src/btree/bt_walk.c:526
      #6  __wt_tree_walk (session=0x7f1dc3215c58, refp=0x7f1d94146238, flags=2688)
          at ../src/btree/bt_walk.c:597
      #7  __wt_btcur_prev (cbt=0x7f1d941460c0, truncating=false) at ../src/btree/bt_curprev.c:675
      #8  __curfile_prev (cursor=0x7f1d941460c0) at ../src/cursor/cur_file.c:147
      #9  read_op (cursor=0x7f1d941460c0, op=PREV, exactp=0x0) at ../../../test/format/ops.c:325
      #10 nextprev (tinfo=0x55aeaed34f90, cursor=0x7f1d941460c0, next=false)
          at ../../../test/format/ops.c:1364
      #11 ops (arg=0x55aeaed34f90) at ../../../test/format/ops.c:1107
      

      That is spinning in page in, waiting for a page that is in WT_REF_LOCKED (or similar). The checkpoint thread is attempting to do a reverse split:

      #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
      #1  0x00007f1dc2660023 in __GI___pthread_mutex_lock (mutex=0x7f1d482b76b8) at ../nptl/pthread_mutex_lock.c:78
      #2  0x000055aeabfb5ca0 in __wt_spin_lock (session=0x7f1dc3212920, t=0x7f1d482b76b8) at ../src/include/mutex.i:173
      #3  0x000055aeabfbc85e in __split_internal_lock (session=0x7f1dc3212920, ref=0x7f1d5081d2f0, trylock=false, 
          parentp=0x7f1dac7e7620) at ../src/btree/bt_split.c:1251
      #4  0x000055aeabfbf21b in __split_reverse (session=0x7f1dc3212920, ref=0x7f1d5081d2f0) at ../src/btree/bt_split.c:2229
      #5  0x000055aeabfbf313 in __wt_split_reverse (session=0x7f1dc3212920, ref=0x7f1d5081d2f0)
          at ../src/btree/bt_split.c:2251
      #6  0x000055aeabeeb7bb in __evict_delete_ref (session=0x7f1dc3212920, ref=0x7f1d5081d2f0, closing=false)
          at ../src/evict/evict_page.c:284
      #7  0x000055aeabeebbec in __evict_page_clean_update (session=0x7f1dc3212920, ref=0x7f1d5081d2f0, closing=false)
          at ../src/evict/evict_page.c:336
      #8  0x000055aeabeeb34b in __wt_evict (session=0x7f1dc3212920, ref=0x7f1d5081d2f0, closing=false, previous_state=5)
          at ../src/evict/evict_page.c:219
      #9  0x000055aeabeeacb2 in __wt_page_release_evict (session=0x7f1dc3212920, ref=0x7f1d5081d2f0)
          at ../src/evict/evict_page.c:92
      #10 0x000055aeabf15d93 in __wt_page_release (session=0x7f1dc3212920, ref=0x7f1d5081d2f0, flags=136)
          at ../src/include/btree.i:1521
      #11 0x000055aeabf19697 in __rec_child_modify (session=0x7f1dc3212920, r=0x7f1d6408d430, ref=0x7f1d5081d2f0, 
          hazardp=0x7f1dac7e7891, statep=0x7f1dac7e7894) at ../src/reconcile/rec_write.c:1922
      #12 0x000055aeabf1fe17 in __rec_row_int (session=0x7f1dc3212920, r=0x7f1d6408d430, page=0x55aeb07b0400)
          at ../src/reconcile/rec_write.c:4854
      #13 0x000055aeabf1674c in __wt_reconcile (session=0x7f1dc3212920, ref=0x55aeb0662db0, salvage=0x0, flags=1, 
          lookaside_retryp=0x0) at ../src/reconcile/rec_write.c:469
      #14 0x000055aeabfc387d in __wt_sync_file (session=0x7f1dc3212920, syncop=WT_SYNC_CHECKPOINT)
          at ../src/btree/bt_sync.c:337
      #15 0x000055aeabf69736 in __checkpoint_tree (session=0x7f1dc3212920, is_checkpoint=true, cfg=0x7f1dac7e7e20)
          at ../src/txn/txn_ckpt.c:1667
      #16 0x000055aeabf69a78 in __checkpoint_tree_helper (session=0x7f1dc3212920, cfg=0x7f1dac7e7e20)
          at ../src/txn/txn_ckpt.c:1783
      #17 0x000055aeabf65d44 in __checkpoint_apply (session=0x7f1dc3212920, cfg=0x7f1dac7e7e20, 
          op=0x55aeabf699ea <__checkpoint_tree_helper>) at ../src/txn/txn_ckpt.c:218
      #18 0x000055aeabf67941 in __txn_checkpoint (session=0x7f1dc3212920, cfg=0x7f1dac7e7e20) at ../src/txn/txn_ckpt.c:909
      #19 0x000055aeabf683ca in __txn_checkpoint_wrapper (session=0x7f1dc3212920, cfg=0x7f1dac7e7e20)
          at ../src/txn/txn_ckpt.c:1107
      #20 0x000055aeabf68546 in __wt_txn_checkpoint (session=0x7f1dc3212920, cfg=0x7f1dac7e7e20, waiting=true)
          at ../src/txn/txn_ckpt.c:1171
      #21 0x000055aeabf43f77 in __session_checkpoint (wt_session=0x7f1dc3212920, config=0x0)
          at ../src/session/session_api.c:1984
      #22 0x000055aeabeb1821 in checkpoint (arg=0x0) at ../../../test/format/util.c:571
      #23 0x00007f1dc265d6db in start_thread (arg=0x7f1dac7e8700) at pthread_create.c:463
      

      Interestingly there is a comment in __split_lock_internal indicating that we've dealt with the reverse of this in the past:

              /*                                                                      
               * A checkpoint reconciling this parent page can deadlock with          
               * our split. We have an exclusive page lock on the child before        
               * we acquire the page's reconciliation lock, and reconciliation        
               * acquires the page's reconciliation lock before it encounters         
               * the child's exclusive lock (which causes reconciliation to           
               * loop until the exclusive lock is resolved). If we want to split      
               * the parent, give up to avoid that deadlock.                          
               */ 
      

      The open question is who is holding that lock - locks are only taken when modifying pages - and all the other threads are doing reads, so shouldn't be holding the lock. Perhaps the checkpoint thread already has the lock?

            Assignee:
            donald.anderson@mongodb.com Donald Anderson
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: