Skip scratch buffer in __rec_append_orig_value for non-overflow value cells

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines - Transactions
    • 439.812
    • SE Transactions - 2026-06-19
    • 1

      Summary

      In _rec_append_orig_value (src/reconcile/rec_visibility.c), the on-page value is copied into a session scratch buffer via wt_page_cell_data_ref_kv before being handed to wt_upd_alloc. For non-overflow cells (unpack->type == WT_CELL_VALUE), cell_data_ref does nothing beyond store->data = unpack->data; store->size = unpack->size; and _wt_upd_alloc memcpy's the value into the new WT_UPDATE immediately. The scratch buffer is wasted work in that path.

      Proposed change

      Pass a stack-local WT_ITEM populated from unpack->data / unpack->size directly to _wt_upd_alloc when unpack->type == WT_CELL_VALUE. Keep the scratch-buffer path for overflow cells, where _wt_ovfl_read needs a real buffer.

      /* Before */
      WT_ERR(__wt_scr_alloc(session, 0, &tmp));
      WT_ERR(__wt_page_cell_data_ref_kv(session, page, unpack, tmp));
      WT_ERR(__wt_upd_alloc(session, tmp, WT_UPDATE_STANDARD, &append, &size));
      
      /* After */
      WT_ITEM cell_ref;
      WT_ITEM *src;
      if (unpack->type == WT_CELL_VALUE) {
          cell_ref.data = unpack->data;
          cell_ref.size = unpack->size;
          src = &cell_ref;
      } else {
          WT_ERR(__wt_scr_alloc(session, 0, &tmp));
          WT_ERR(__wt_page_cell_data_ref_kv(session, page, unpack, tmp));
          src = tmp;
      }
      WT_ERR(__wt_upd_alloc(session, src, WT_UPDATE_STANDARD, &append, &size));
      

      WT_CELL_VALUE_COPY cells decode to unpack->type == WT_CELL_VALUE after the copy-cell restart in __wt_cell_unpack_kv (only unpack->raw retains the COPY tag), so they take the fast path correctly. WT_CELL_VALUE_OVFL keeps unpack->type == WT_CELL_VALUE_OVFL and goes through the scratch path. WT_CELL_VALUE_OVFL_RM is excluded by the existing assert.

      Motivation

      Flamegraph from BF-41977 profile patch 6a15b28e (tpcc_majority_out_of_cache, clean mainline, WT-17490 + WT-17598 already merged):

      • __rec_append_orig_value self-time: 1.19% DSC vs 0.06% ASC (+1.13 pp)
      • Called more in DSC because the 30-min snapshot history window causes more upd_select->upd_saved == true triggers per reconciliation (see BF-41977 flamegraph attachment bf41977_dsc_workload_flamegraph.svg.gz)

      Why this is low priority

      Single-run sys-perf result on DSC tpcc_majority_out_of_cache (patch 6a166576, comparison 6a16707e): patch tpmC = 39,942.5 vs 7-day stable mean 43,528 (CoV 3.93%). Inconclusive — the workload's run-to-run noise is larger than the predicted gain. To prove the 1% effect against 4% CoV needs a 3-5 clone multipatch.

      The change itself is small (20 lines), safe, and based on a real flamegraph hotspot. It's a reasonable bundle candidate if other reconciliation micro-optimizations get worked at the same time.

      Verification

      • __cell_data_ref only copies data/size fields for WT_CELL_VALUE (src/include/cell_inline.h line 1855)
      • __wt_upd_alloc memcpy's the value into the new WT_UPDATE (src/include/txn_inline.h line 1476), so the stack-local WT_ITEM doesn't need to outlive the call
      • Builds cleanly (verified locally)
      • Sys-perf run on DSC tpcc_majority_out_of_cache completed without errors (just no measurable improvement at n=1)

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Haribabu Kommi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: