-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
Fully Compatible
-
v8.2
-
0
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
During performance testing for the FLE Prefix/suffix/substring initiative https://jira.mongodb.org/browse/SPM-3012 , a critical performance issue was found in the substring use case.
The root cause is due to the way that the server rewrite of the text predicates is done. In particular, when generating a tag disjunction for aggregate expressions, the following aggregate expression is generated:
{ "$expr" : { "$or" : [ { "$in" : [ 1, "$a" ] }, { "$in" : [ 2, "$a" ] } ] } },
Where "$a" corresponds to the {}safeContent{} array, on which there is an index. This is a simplified example. In the actual substring case, we may have 1000+ nested $in expressions on "$a". In the DSI workload, we see a simple substring query take up to 14 minutes to return.
Upon investigation, the performance issue is due the query planner choosing a collection scan instead of leveraging the index scan.
It is currently not possible to express the query using match expressions, because during syntax review for the new FLE expressions, it was decided that we do not want to introduce new expressions into the match language.
The FLE query rewrite code is designed in a way in which it is only possible to rewrite an aggregation expression with an encrypted payload into another aggregation expression with tags for the encrypted predicate, or a match expression into a match expression with the tags for the encrypted predicate.
It is currently not possible to express the query as :
{ a: $in [1,2,..]}
Because that would require that $encStrContains have a match expression implementation, i.e :
{$encStrContains: { input: "$a", substring: "eve"}}
It is also not possible to express the existing query with a condensed $in aggregate expression, as it would require something like this :
// This does not work, $in aggregate expression semantics mean that // this expression would check for the existence of the array [1,2,3,4] //as an array element of the array $a {"$expr" : { "$in" : [ [1,2,3,4] , "$a" ] } }
Request:
We would like to understand what the feasibility of leveraging the index on "a" (i.e '_safeContent_') is for a nested aggregate expression like in the examples above.
Note, this may be related to indexing on array fields. The following also does not use the index:
db.coll.find({$expr: {$in: [1, "$a"]}})
- is depended on by
-
SERVER-108100 Enable index scan usage for $expr + $in expressions
-
- Backlog
-
- is related to
-
SERVER-37470 Lookup sub-pipeline is not using index with the $in operator
-
- Backlog
-
-
SERVER-108073 Pull $and/$or out of $expr MatchExpressions
-
- Backlog
-
-
SERVER-32549 allow $expr with $in expression to use indexes
-
- Closed
-
- related to
-
SERVER-108198 Fix FLE2 encrypted equality and range aggregations resorting to collection scan
-
- Backlog
-
-
SERVER-108179 FLE queries with encrypted text predicates fail to generate an index scan
-
- Closed
-
-
SERVER-108075 MQL expression to throw an error
-
- Closed
-