Add histogram statistics to improve understanding of tree shape

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines, Storage Engines - Transactions
    • SE Transactions - 2025-11-07
    • 5

      Periodically we have questions about the emergent behavior of WiredTiger's BTrees.

      • Do some trees accumulate smaller-than-desired pages over time?
      • How much space do we lose to internal fragmentation (i.e., the padding we add to round pages up to a multiple of the allocation size)?
      • How much space is used for managing the visibility of soon-to-be obsolete records (i.e., time window information and deleted records)?

      It would be expensive to walk BTrees regularly to get a completely accurate profile of this information. But we could get some useful insight by collecting histograms of these attributes when we write pages.

      The proposal here is to add table-level and connection-level histogram statistics to track for each leaf page write

      • How much smaller the page is than leaf_page_max
      • How much padding was added in the block manager to round up to a multiple of the allocation size
      • The amount of visibility-related information on the page we expect would be removed if all records were older than the history window.

      We may want to refine exactly what we collect and how. The proposal here is a starting point.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: