[SERVER-40636] $expr inside $match does not use indexes in many cases Created: 13/Apr/19  Updated: 03/Jun/19  Resolved: 03/Jun/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Egor Elagin Assignee: Asya Kamsky
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bad-aggr-query-explained.json     File bad-aggr-query-info.json     File bad-find-query-explained.json     File bad-find-query-info.json     File good-aggr-query-explained.json     File good-aggr-query-info.json    
Issue Links:
Duplicate
duplicates SERVER-34926 allow $expr with comparison bounded o... Closed
Participants:

 Description   

Related issues:

Summary:
**$expr inside $match stage seems to only respect index when using $eq operator. Other operators (e.g. $gt, $gte, $lt, $lte, $indexOfCP) would result collection scan.

I decided to create separate ticket because it looks to me as a bug rather than enhancement, since $expr operator within $match stage have no alternatives sometimes and for large datasets it becomes unusable if index is ignored.

Documentation states that "$expr does not support multikey indexes" which probably implies that is should support single field indexes.

Tested with ascending index on a single field for date, int and string for $gt, $gte, $lt, $lte, $indexOfCP operators. Both find and aggregate would result collection scan. Only $eq would result ixscan.

Tested on community version 4.0.8 / 4.0.9 and enterprise version 4.0.8, against single instance and replica set (distmod: "rhel70").

 

 

Steps to reproduce:

 

Sample set:

const size = 100000;
[...new Array(size)].map((x, i) => ({
    num: i,
    str: 'val_' + i,
    date: new Date(new Date() + (size/2 - i)*1000*60)
})) 

 

Index definitions:

createIndex({num: 1},  {name: `${colName}_num_asc` });
createIndex({str: 1},  {name: `${colName}_str_asc` });
createIndex({date: 1}, {name: `${colName}_date_asc` });

 
Find query:

find({$expr: {$gt: ['$num', 90000]}}, {explain: true}).limit(5)

 

Aggregation:

aggregate(
  [{$match: {$expr: {$gt: ['$num', 90000]}}}, {$limit: 5}], 
  { explain: true }) 

 

Expected result:
ixscan

Actual result:
collscan

Details:
Explained and profiled queries are attached.
'Good-*' attachements - result of $eq operator.
'Bad-*' attachments - result of $gt operator.

 

 



 Comments   
Comment by Asya Kamsky [ 03/Jun/19 ]

It's only correct to use index for $gt/$lt when $expr bounds it to same type (either by having two bounds of same type, or by having $type test for same type that is being compared to) because our index matching is type bracketed same as the find semantics. Aggregation comparison semantics are not so using an index would not give correct results if there's just a $gt or $lt expression.

Closing as duplicate of SERVER-34926 which is tracking using index with $expr comparisons when it's correct.

Comment by Craig Homa [ 07/May/19 ]

asya to investigate if this is in fact, a duplicate.

Comment by Asya Kamsky [ 17/Apr/19 ]

egor.elagin aggregation semantics (which is what $expr uses) are different from find/match semantics which is what indexes support. If we used the index for $gt (as an example) we would not get the same results as without the index (for instance if the value had other types than the one being compared to).

Comment by Eric Sedor [ 16/Apr/19 ]

Thanks egor.elagin. We will look into this.

Comment by Egor Elagin [ 14/Apr/19 ]

Update: it's definitely not a Major-P3 priority, and most probably not a bug. I think match inside $expr wasn't designed to use index from the beginning, `$eq` is special case, since it has custom optimization to use index.

Generated at Thu Feb 08 04:55:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.