Expose lightweight stat for approximate leaf page count per table

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines - Transactions
    • 359.026
    • None
    • None

      Background

      The MongoDB query optimizer needs to estimate the number of leaf pages in a WiredTiger table for cost modeling and join-order optimization.

      Today we approximate this as: 

      (WT_STAT_DSRC_BLOCK_SIZE - WT_STAT_DSRC_BLOCK_REUSE_BYTES) / leaf_page_max

      i.e., bytes-on-disk / 32KB. In practice, actual leaf page sizes deviate significantly from leaf_page_max, giving ~40% error on many TPC-H collections compared to WT_STAT_DSRC_BTREE_ROW_LEAF.

      WT_STAT_DSRC_BTREE_ROW_LEAF is accurate but requires a tree walk (all statistics), which is too expensive for production query planning.

      Prototype

      We [prototyped|https://github.com/10gen/mongo/pull/51825] a sampling approach: open a random cursor, cast to WT_CURSOR_BTREE, and read ref->page->entries to estimate avg docs-per-leaf-page. This reduced error to <5% on TPC-H, but it depends on private WT headers and internal structs, so it's not production-safe.

      Request

      Expose a stable stat or API (per table) that provides either:

      • Approximate leaf page count, or
      • Approximate avg records-per-leaf-page

      without requiring a full tree walk.

      Constraints:

      • Overhead must be small enough for regular use in production query planning paths.
      • Approximate / eventually consistent is fine, as long as it avoids large systematic bias (e.g., <10% avg error vs the current ~40%).
      • Must not require MongoDB to depend on private headers or internal WT structures.

      Possible approaches:

      • A new DSRC stat tracking approximate leaf page count via incremental updates on splits/merges.
      • A DSRC stat for avg records-per-leaf-page, updated via background sampling or during normal cursor traversal.
      • A lightweight statistics level that enables this metric without a full tree walk.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Ben Shteinfeld
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: