[SERVER-59320] Use DISTINCT_SCAN on multikey indexes in special cases Created: 12/Aug/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: | |
| Case: | (copied to CRM) |
| Description |
|
Given an index on {X:1, Y:1, Z:1, D:1} and a distinct() query which has filters on X, Y, and Z, and requests a distinct on D, we currently will not use DISTINCT_SCAN if D is multikey.
I believe there is a special case where we can actually use DISTINCT_SCAN when D is multikey: -D has [MinKey, MaxKey] bounds. AND one of the following is true: 1) None of X, Y,Z share a multikey path prefix with D. 2) All of X,Y,Z which do share a multikey path prefix with D have [MinKey,MaxKey] bounds.
For example, assuming D has [MinKey, MaxKey] bounds: If X is 'a.b' and D is 'c.d' then the optimization can be done. If X is 'a.b' and D is 'a.c' and 'a' is not multikey, the optimization can be done. If X is 'a.b' and D is 'a.c' and 'a' is multikey and the bounds on 'a.b' are not [MinKey, MaxKey] the optimization cannot be done.
These are the same conditions we use for determining whether a multikey index can provide a sort. See this code for a more detailed explanation. |