Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Optimization
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Consider the query

{a: 5, b: {$gt: 6}}}

and indexes

{a: 1, b: 1}
{b: 1, a: 1}

The two plans we generate with use either one of the indexes with point bounds on 'a' and a range on 'b'. The cost of these index scans is not the same! The (a,b) scan is significantly cheaper because there is a single index seek. The (b,a) scan is an index skip scan plan as it required seeking in the index # of unique values of b times.

~~SERVER-97529~~ implemented a heuristic to handle this case only considering the bounds of the equality prefix of the scan, that is the (b, a) scan will be estimated as is the bounds were b: (6, inf] a: [MinKey, MaxKey].

This fixed the plan selection for the case presented above, but it is fairly easy to construct a scenario where it leads to wildly inaccurate estimates. This ticket is to explore ideas to improve this estimation.

One idea is estimating the NDV of each equality prefix to estimate the number of seeks the index scan will need to perform and calibrating a corresponding cost coefficient. Histograms are good at estimating NDV, but sampling may have problems here.

related to

SERVER-97529 Select optimal plan based on cost

Closed

Assignee:: Unassigned
Reporter:: Ben Shteinfeld
Participants:: Ben Shteinfeld
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 07 2025 05:33:04 PM UTC
Updated:: Feb 07 2025 05:33:21 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates