checkpoint_handle_dropped/locked/meta_checked never reset on mongodb-8.0

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Persistence
    • 209.039
    • SE Persistence backlog
    • None
    • v8.3, v8.0

      Summary

      On the mongodb-8.0 branch, three connection-level checkpoint statistics grow monotonically across the lifetime of the process instead of being reset at the start of each checkpoint:

      • checkpoint_handle_dropped
      • checkpoint_handle_locked
      • checkpoint_handle_meta_checked

      (The matching duration stats checkpoint_handle_drop_duration, checkpoint_handle_lock_duration, and checkpoint_handle_meta_check_duration are affected by the same bug.)

      A workload with 4k–12k open data handles reported these counters ratcheting upward across every checkpoint. The expected behaviour is that they are reset before each gather and then published with the per-checkpoint value (matching the existing checkpoint_handle_applied / checkpoint_handle_skipped pattern).

      Root cause

      The bug was introduced by WT-12440 (commit 11257fbcef, Feb 2024), which added the new connection counters ckpt_drop, ckpt_lock, ckpt_meta_check (and their time companions) and published them via WT_STAT_CONN_SET at the end of each gather in _wt_conn_btree_apply. The patch forgot to zero them at the start of the gather. Only the two pre-existing counters were reset:

      /* mongodb-8.0 src/conn/conn_dhandle.c, __wt_conn_btree_apply */
      if (WT_SESSION_IS_CHECKPOINT(session)) {
          time_start = __wt_clock(session);
          conn->ckpt_apply = conn->ckpt_skip = 0;
          conn->ckpt_apply_time = conn->ckpt_skip_time = 0;
          /* ckpt_drop / ckpt_lock / ckpt_meta_check are NEVER zeroed */
          F_SET(conn, WT_CONN_CKPT_GATHER);
      }
      

      The increment sites in _wt_checkpoint_get_handles (+conn->ckpt_meta_check, conn->ckpt_lock) and in _checkpoint_lock_dirty_tree (+conn->ckpt_drop) therefore accumulate across every checkpoint, and the end-of-gather publish writes the cumulative value:

      WT_STAT_CONN_SET(session, checkpoint_handle_dropped, conn->ckpt_drop);
      WT_STAT_CONN_SET(session, checkpoint_handle_locked, conn->ckpt_lock);
      WT_STAT_CONN_SET(session, checkpoint_handle_meta_checked, conn->ckpt_meta_check);
      

      With N dhandles and M checkpoints since process start, each of these stats reports roughly N × M instead of N.

      Why this is only on 8.0 and not on more recent branches

      WT-13990 (commit d48adcd6ad, Jan 2025) refactored the handle-related stats into a private struct WT_CKPT_HANDLE_STATS inside the checkpoint module and introduced _wt_checkpoint_handle_stats_clear, which zeros _all of apply/drop/lock/meta_check/skip and their _time partners at the start of each gather. As a side effect of moving the reset into a single helper, WT-13990 fixed the missing resets.

      git branch --contains confirms the picture:

      • WT-12440 (11257fbcef) — present on mongodb-8.0, mongodb-8.2, mongodb-8.3, mongodb-master.
      • WT-13990 (d48adcd6ad) — present on mongodb-8.2, mongodb-8.3, mongodb-master. Missing on mongodb-8.0.

      So 8.2+ already silently absorbed the fix when the refactor landed; only 8.0 still has the bug.

      Proposed minimal fix for mongodb-8.0

      A full backport of WT-13990 is invasive (new private header, struct relocation, public API rename). For an 8.0 hotfix we want the smallest change that produces the same observable behaviour — just zero the missing counters in the existing reset block in __wt_conn_btree_apply:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/src/conn/conn_dhandle.c b/src/conn/conn_dhandle.c
      --- a/src/conn/conn_dhandle.c
      +++ b/src/conn/conn_dhandle.c
      @@ -717,8 +717,10 @@
               if (WT_SESSION_IS_CHECKPOINT(session)) {
                   time_start = __wt_clock(session);
      -            conn->ckpt_apply = conn->ckpt_skip = 0;
      -            conn->ckpt_apply_time = conn->ckpt_skip_time = 0;
      +            conn->ckpt_apply = conn->ckpt_drop = conn->ckpt_lock = conn->ckpt_meta_check =
      +              conn->ckpt_skip = 0;
      +            conn->ckpt_apply_time = conn->ckpt_drop_time = conn->ckpt_lock_time =
      +              conn->ckpt_meta_check_time = conn->ckpt_skip_time = 0;
                   F_SET(conn, WT_CONN_CKPT_GATHER);
               }
      

      After the patch, the six affected handle stats behave like checkpoint_handle_applied and checkpoint_handle_skipped already do on 8.0 — zeroed at the start of each gather, set to the per-checkpoint value at the end.

      Definition of Done

      • Patch applied to mongodb-8.0 in src/conn/conn_dhandle.c.
      • Manual verification that checkpoint_handle_dropped, checkpoint_handle_locked, and checkpoint_handle_meta_checked report values on the order of the open dhandle count after each checkpoint (not monotonically increasing).
      • No changes required on mongodb-8.2 / mongodb-8.3 / mongodb-master — already fixed via WT-13990.

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: