Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3844

Checkpoints can hang on limbo pages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • WT3.1.0
    • 3.6.3, 3.7.2, WT3.1.0
    • None
    • None

    Description

      As seen in this run: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit/6791/console

      We end up with all internal threads sleeping and the checkpoint blocked:

            9 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal
            1 __wt_gen,__wt_session_gen_enter,__wt_hazard_check,__page_read,__wt_page_in_func,
              __wt_page_swap_func,__tree_walk_internal,__wt_tree_walk,__sync_file,__wt_cache_op,
              __checkpoint_tree,__checkpoint_tree_helper,__checkpoint_apply,__txn_checkpoint,
              __txn_checkpoint_wrapper,__wt_txn_checkpoint,__session_checkpoint,
              _wrap_Session_checkpoint,PyEval_EvalFrameEx,...
      

      What's happening is that the checkpoint is trying to read a page in the WT_REF_LIMBO state. But the eviction server thread has a hazard pointer on that page, preventing checkpoint from making progress.

      We already have the eviction server take care not to stop on certain pages. This seems like another case we need to handle there.

      Attachments

        Issue Links

          Activity

            People

              michael.cahill@mongodb.com Michael Cahill
              michael.cahill@mongodb.com Michael Cahill
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: