[SERVER-59320] Use DISTINCT_SCAN on multikey indexes in special cases Created: 12/Aug/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ian Boros Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Given an index on {X:1, Y:1, Z:1, D:1} and a distinct() query which has filters on X, Y, and Z, and requests a distinct on D, we currently will not use DISTINCT_SCAN if D is multikey. SERVER-28952 explains why this was done.

 

I believe there is a special case where we can actually use DISTINCT_SCAN when D is multikey:

-D has  [MinKey, MaxKey] bounds.

AND one of the following is true:

1) None of X, Y,Z share a multikey path prefix with D.

2) All of X,Y,Z which do share a multikey path prefix with D have [MinKey,MaxKey] bounds.

 

For example, assuming D has  [MinKey, MaxKey] bounds:

If X is 'a.b' and D is 'c.d' then the optimization can be done.

If X is 'a.b' and D is 'a.c' and 'a' is not multikey, the optimization can be done.

If X is 'a.b' and D is 'a.c' and 'a' is multikey and the bounds on 'a.b' are not [MinKey, MaxKey] the optimization cannot be done.

 

 

These are the same conditions we use for determining whether a multikey index can provide a sort. See this code for a more detailed explanation.


Generated at Thu Feb 08 05:46:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.