[SERVER-45689] DISTINCT_SCAN candidate plans should be generated and evaluated with the multi-planner Created: 21/Jan/20 Updated: 04/Jan/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||
| Sprint: | QO 2022-09-19 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
The DISTINCT_SCAN stage implements an optimized specialization of an index scan which is appropriate for a subset of distinct or aggregate operations. It involves skipping duplicate keys via an index seek. The planning logic for DISTINCT_SCAN is implemented outside of the planner. It involves first invoking QueryPlanner::plan(), and then seeing if any of the resulting plans can be correctly converted to a DISTINCT_SCAN. However, as soon as we are able to construct our first DISTINCT_SCAN plan, we pass it off to the execution engine without considering other candidates. See https://github.com/mongodb/mongo/blob/48cd578fa9c3ef317666ca475f9ee14c1fe0bc4f/src/mongo/db/query/get_executor.cpp#L1535-L1556. It is possible that there are multiple DISTINCT_SCAN plans, and that one will outperform another. The efficiency of the DISTINCT_SCAN relates to position of the field we're "distincting" in the index key pattern, as well as the number of unique values in the collection for the preceding key pattern fields. By simply selecting the first DISTINCT_SCAN, we might select a plan that is substantially suboptimal. Instead, we should generate a set of DISTINCT_SCAN candidate plans. These candidates could then be scored and ranked according to our usual multi-planning algorithm. A few additional concerns that come to mind:
|
| Comments |
| Comment by Brenda Rodriguez [ 31/Oct/22 ] |
|
We are sending this back to the backlog and director triage for assignment |
| Comment by Joel Redman (Inactive) [ 22/Oct/21 ] |
|
Initial thoughts:
|
| Comment by Charlie Swanson [ 08/Sep/21 ] |
|
Flipping this back to triage based off christopher.harris's comment which should help steve.la and bernard.gorman make a more informed scheduling decision. |