Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3844

Checkpoints can hang on limbo pages

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.3, 3.7.2, WT3.1.0
    • Affects Version/s: WT3.1.0
    • Component/s: None
    • Labels:
      None

      As seen in this run: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit/6791/console

      We end up with all internal threads sleeping and the checkpoint blocked:

            9 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal
            1 __wt_gen,__wt_session_gen_enter,__wt_hazard_check,__page_read,__wt_page_in_func,
              __wt_page_swap_func,__tree_walk_internal,__wt_tree_walk,__sync_file,__wt_cache_op,
              __checkpoint_tree,__checkpoint_tree_helper,__checkpoint_apply,__txn_checkpoint,
              __txn_checkpoint_wrapper,__wt_txn_checkpoint,__session_checkpoint,
              _wrap_Session_checkpoint,PyEval_EvalFrameEx,...
      

      What's happening is that the checkpoint is trying to read a page in the WT_REF_LIMBO state. But the eviction server thread has a hazard pointer on that page, preventing checkpoint from making progress.

      We already have the eviction server take care not to stop on certain pages. This seems like another case we need to handle there.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: