[SERVER-54427] DISTINCT_SCAN can be used with compound index when filter is on non-multikey field Created: 09/Feb/21 Updated: 11/Nov/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
If you have an index on a:1, b:1 and distinct command or agg equivalent
we use DISTINCT_SCAN unless b is multikey - and it's not clear why - it doesn't need IXSCAN nor FETCH (which is what it shows using). |
| Comments |
| Comment by Michael M [ 11/Nov/23 ] |
|
Currently I'm using db.collection.distinct to get values from a multikey index. It's fast because it uses DISTINCT_SCAN. However, there's a 16mb document limit. db.collection.aggregate with $unwind and $group stage does a COLLSCAN or IXSCAN with FETCH (when forced with a hint) so it has to read the whole collection.
Are there any plans to implement the optimization to emit DISTINCT_SCAN for such cases? Is there currently another way to get distinct values from a multikey index with DISTINCT_SCAN beyond the 16m limit?
Thanks. |