[SERVER-32549] allow $expr with $in expression use a non-multikey index Created: 04/Jan/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 25 |
| Labels: | storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
Similarly to Semantically
If we restrict using an index for $expr $in expressions to non-multikey indexes/paths, then we will only get back matches from the index that are guaranteed to be correct semantically to the aggregation meaning of $in. This is also exactly equivalent query that can use any index and return only agg semantics:
The second part prevents reaching into arrays for comparison. |
| Comments |
| Comment by Joerg Baier [ 02/Sep/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sharing my concerns from a related issue, as this seems to be more popular. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 21/Nov/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry wrong ticket | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anders Cassidy [ 30/Sep/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The $unwind -> $match -> $group workaround does not scale well at all...performance is terrible with more than a few thousand rows | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sten [ 03/Nov/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Allowing $expr and $in to use non-multikey (and perhaps multikey) indices could be a huge performance improvement for relational queries with aggregation and $lookup stages (and nested lookup stages). Collection: titles
Collection: messages
Collection: ranks
In order to aggregate messages and ranks within each title record we could use the following aggregation pipeline:
There is a standard index for the "_id" field in each collection. It would be awesome if indices could be used for these kind of queries, because this would allow for relations of one-to-many in a traditional sense. It comes down to allowing indices for these kind of queries indeed (for example ranks collection): Not using indexes (one-to-many) query:
It is worth noting that $match stages using $eq are infact indexed. This allows for high performance one-to-one relation searches: Using indexes (one-to-one) query:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 19/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There is a workaround for cases where this is an expressive "$lookup" on an array of scalars and that's to first $unwind and then use $eq rather than $in. It doesn't help with using $in+$expr outside of $lookup. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 09/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could you provide guidance on priority? It's still not clear to me whether you believe that we need to schedule this work alongside 3.8 development. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 08/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm ok with it being in follow-on improvement though it's a pretty painful shortcoming of original implementation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 05/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
asya, I'm to suggest removing this from the "More expressive $lookup" project and tracking it as a follow-on improvement. Does that work for you? Also, could you provide guidance on priority? Is this something we should schedule (and maybe even aim to backport to 3.6)? |