Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3844

Checkpoints can hang on limbo pages

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: WT3.1.0
    • Fix Version/s: 3.6.3, 3.7.2, WT3.1.0
    • Component/s: None
    • Labels:
      None

      Description

      As seen in this run: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit/6791/console

      We end up with all internal threads sleeping and the checkpoint blocked:

            9 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal
            1 __wt_gen,__wt_session_gen_enter,__wt_hazard_check,__page_read,__wt_page_in_func,
              __wt_page_swap_func,__tree_walk_internal,__wt_tree_walk,__sync_file,__wt_cache_op,
              __checkpoint_tree,__checkpoint_tree_helper,__checkpoint_apply,__txn_checkpoint,
              __txn_checkpoint_wrapper,__wt_txn_checkpoint,__session_checkpoint,
              _wrap_Session_checkpoint,PyEval_EvalFrameEx,...
      

      What's happening is that the checkpoint is trying to read a page in the WT_REF_LIMBO state. But the eviction server thread has a hazard pointer on that page, preventing checkpoint from making progress.

      We already have the eviction server take care not to stop on certain pages. This seems like another case we need to handle there.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                michael.cahill Michael Cahill
                Reporter:
                michael.cahill Michael Cahill
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: