Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Optimization
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We've identified that our current estimates for the number of pages in a collection can be quite wrong.

sf   collName  costModelEstimatedPages  btreeLeafPages  pctError
---  --------  -----------------------  --------------  --------
0.1  customer  53.8                     38              41.6%
0.1  lineitem  2435.5                   4140            -41.2%
0.1  nation    0.6                      1               -40.0%
0.1  orders    430.5                    624             -31.0%
0.1  part      53.8                     45              19.6%
0.1  partsupp  215.3                    355             -39.4%
0.1  region    0.6                      1               -40.0%
0.1  supplier  4                        3               33.3%
1    customer  608.9                    757             -19.6%
1    lineitem  19484                    20187           -3.5%
1    nation    0.6                      1               -40.0%
1    orders    3444.3                   3938            -12.5%
1    part      608.9                    1092            -44.2%
1    partsupp  2435.5                   3660            -33.5%
1    region    0.6                      1               -40.0%
1    supplier  38.1                     22              73.2%

This is because we currently estimate the number of pages by assuming that WT will create pages of 32K bytes post compression (leaf_page_max). Then we get the number of bytes used in the wt file and divide by 32K to get an estimate of the number of leaf pages. In reality, the leaf pages do not end up being close to 32K (we need to better understand why this is).

An alternative idea is to use sampling to estimate the average number of documents per page and then using the document count to estimate the number of pages. I prototyped this approach here.

This ticket tracks the work to productionize this solution, which will involve negotiating with the WT team about exposing an API that allows for the cursor to know how many documents live on the current page.

blocks

SERVER-124504 Join optimization testing: Re-enable compression by default in plan stability test

Blocked

is related to

WT-17175 Expose lightweight stat for approximate leaf page count per table

Closed

related to

SERVER-129082 Use WiredTiger's approximate leaf page count in cost calculations

In Progress

Assignee:: Unassigned
Reporter:: Ben Shteinfeld
Participants:: Ben Shteinfeld
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 14 2026 08:14:03 PM UTC
Updated:: Jun 16 2026 08:44:52 AM UTC
Resolved:: Jun 16 2026 08:44:52 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty