-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Consider the query
{a: 5, b: {$gt: 6}}}
and indexes
{a: 1, b: 1} {b: 1, a: 1}
The two plans we generate with use either one of the indexes with point bounds on 'a' and a range on 'b'. The cost of these index scans is not the same! The (a,b) scan is significantly cheaper because there is a single index seek. The (b,a) scan is an index skip scan plan as it required seeking in the index # of unique values of b times.
SERVER-97529 implemented a heuristic to handle this case only considering the bounds of the equality prefix of the scan, that is the (b, a) scan will be estimated as is the bounds were b: (6, inf] a: [MinKey, MaxKey].
This fixed the plan selection for the case presented above, but it is fairly easy to construct a scenario where it leads to wildly inaccurate estimates. This ticket is to explore ideas to improve this estimation.
One idea is estimating the NDV of each equality prefix to estimate the number of seeks the index scan will need to perform and calibrating a corresponding cost coefficient. Histograms are good at estimating NDV, but sampling may have problems here.
- related to
-
SERVER-97529 Select optimal plan based on cost
-
- Closed
-