Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12846

Fix how compact walk handles EBUSY from checkpoint flush_lock

    • Storage Engines
    • 8
    • 2024-08-06 - Withholding Tax
    • v8.0, v7.0, v6.0, v5.0

      Background

      Compaction and checkpoint may conflict, whenever this happens, one may see:

                          WT_ERR_MSG(session, EBUSY,
                            "Compaction halted at data handle %s by eviction pressure. Returning EBUSY.",
                            session->op_handle[i]->name);
      

      This happens when compaction and checkpoint are trying to get the same lock at the same time, see here:

          /*
           * We could corrupt a checkpoint if we moved a block that's part of the checkpoint, that is, if
           * we race with checkpoint's review of the tree. Get the tree's flush lock which blocks threads
           * writing pages for checkpoints, and hold it long enough to review a single internal page. Quit
           * working the file if checkpoint is holding the lock, checkpoint holds the lock for relatively
           * long periods.
           */
          WT_RET(__wt_spin_trylock(session, &S2BT(session)->flush_lock));
      

      Problem

      This conflict has been evident in multiple help tickets (linked below). In these cases, compact both took a long time to complete and recovered little space relative to the space available according to the bytes available for reuse statistic. This could be because the checkpoint conflict causes the compact tree walk to exit early before it's able to reach the pages it needs to rewrite from the end of the file. Then, compact will restart its walk each time and repeatedly read and evict the same internal pages. 

      Acceptance Criteria

      1. Create a test to reproduce the case where checkpoint contends with compact due to the flush_lock and compact exits its walk early.
        • Verify if this causes compact to fail to reclaim space.
      2. Ensure compact still reclaims space in this scenario
      3. Consider if compaction could block checkpoint if it does not let checkpoint get the flush_lock. This could be proven by doing the following:
        diff --git a/src/btree/bt_compact.c b/src/btree/bt_compact.c
        index f0e8bae52..db41a81c2 100644
        --- a/src/btree/bt_compact.c
        +++ b/src/btree/bt_compact.c
        @@ -254,6 +254,7 @@ __compact_walk_internal(WT_SESSION_IMPL *session, WT_REF *parent)
              */
             overall_progress = false;
             WT_INTL_FOREACH_BEGIN (session, parent->page, ref) {
        +        // Sleep and check that checkpoint is waiting.
                 if (F_ISSET(ref, WT_REF_FLAG_LEAF)) {
                     WT_ERR(__compact_page(session, ref, &skipp));
                     if (!skipp)
        

      Suggested Solutions

      1. One possible solution is to remove checkpoints from compact entirely. This would mean relying on a separate checkpoint thread to complete the checkpoints relied on by compact.
      2. We could leave the flush_lock contention as is and improve the compact tree walk so that it doesn't repeat pages it's already looked at.

        1. test_compact_checkpoint.py
          4 kB
        2. flush_lock calls.png
          flush_lock calls.png
          114 kB

            Assignee:
            sean.watt@mongodb.com Sean Watt
            Reporter:
            etienne.petrel@mongodb.com Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: