-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
We've identified that our current estimates for the number of pages in a collection can be quite wrong.
sf collName costModelEstimatedPages btreeLeafPages pctError --- -------- ----------------------- -------------- -------- 0.1 customer 53.8 38 41.6% 0.1 lineitem 2435.5 4140 -41.2% 0.1 nation 0.6 1 -40.0% 0.1 orders 430.5 624 -31.0% 0.1 part 53.8 45 19.6% 0.1 partsupp 215.3 355 -39.4% 0.1 region 0.6 1 -40.0% 0.1 supplier 4 3 33.3% 1 customer 608.9 757 -19.6% 1 lineitem 19484 20187 -3.5% 1 nation 0.6 1 -40.0% 1 orders 3444.3 3938 -12.5% 1 part 608.9 1092 -44.2% 1 partsupp 2435.5 3660 -33.5% 1 region 0.6 1 -40.0% 1 supplier 38.1 22 73.2%
This is because we currently estimate the number of pages by assuming that WT will create pages of 32K bytes post compression (leaf_page_max). Then we get the number of bytes used in the wt file and divide by 32K to get an estimate of the number of leaf pages. In reality, the leaf pages do not end up being close to 32K (we need to better understand why this is).
An alternative idea is to use sampling to estimate the average number of documents per page and then using the document count to estimate the number of pages. I prototyped this approach here.
This ticket tracks the work to productionize this solution, which will involve negotiating with the WT team about exposing an API that allows for the cursor to know how many documents live on the current page.
- blocks
-
SERVER-124504 Join optimization testing: Re-enable compression by default in plan stability test
-
- Blocked
-
- is related to
-
WT-17175 Expose lightweight stat for approximate leaf page count per table
-
- Needs Scheduling
-