-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Checkpoints, Compaction
-
Labels:
-
Storage Engines
-
3
-
StorEng - Defined Pipeline
Compaction and checkpoint may conflict, whenever this happens, one may see:
WT_ERR_MSG(session, EBUSY,
"Compaction halted at data handle %s by eviction pressure. Returning EBUSY.",
session->op_handle[i]->name);
This happens when compaction and checkpoint are trying to get the same lock at the same time, see here:
/* * We could corrupt a checkpoint if we moved a block that's part of the checkpoint, that is, if * we race with checkpoint's review of the tree. Get the tree's flush lock which blocks threads * writing pages for checkpoints, and hold it long enough to review a single internal page. Quit * working the file if checkpoint is holding the lock, checkpoint holds the lock for relatively * long periods. */ WT_RET(__wt_spin_trylock(session, &S2BT(session)->flush_lock));
This ticket should investigate if compaction could block checkpoint if it does not let checkpoint get the flush_lock. I think this could be proven by doing the following:
diff --git a/src/btree/bt_compact.c b/src/btree/bt_compact.c index f0e8bae52..db41a81c2 100644 --- a/src/btree/bt_compact.c +++ b/src/btree/bt_compact.c @@ -254,6 +254,7 @@ __compact_walk_internal(WT_SESSION_IMPL *session, WT_REF *parent) */ overall_progress = false; WT_INTL_FOREACH_BEGIN (session, parent->page, ref) { + // Sleep and check that checkpoint is waiting. if (F_ISSET(ref, WT_REF_FLAG_LEAF)) { WT_ERR(__compact_page(session, ref, &skipp)); if (!skipp)