[SERVER-54427] DISTINCT_SCAN can be used with compound index when filter is on non-multikey field Created: 09/Feb/21  Updated: 11/Nov/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-27494 Avoid unwind on multi-key index field... Backlog
Assigned Teams:
Query Optimization
Operating System: ALL
Participants:
Case:

 Description   

If you have an index on a:1, b:1 and distinct command or agg equivalent

db.foo.distinct("b", {a:1})

we use DISTINCT_SCAN unless b is multikey - and it's not clear why - it doesn't need IXSCAN nor FETCH (which is what it shows using).



 Comments   
Comment by Michael M [ 11/Nov/23 ]

Currently I'm using db.collection.distinct to get values from a multikey index. It's fast because it uses DISTINCT_SCAN. However, there's a 16mb document limit.

db.collection.aggregate with $unwind and $group stage does a COLLSCAN or IXSCAN with FETCH (when forced with a hint) so it has to read the whole collection.

 

Are there any plans to implement the optimization to emit DISTINCT_SCAN for such cases? 

Is there currently another way to get distinct values from a multikey index with DISTINCT_SCAN beyond the 16m limit?

 

Thanks.

Generated at Thu Feb 08 05:33:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.