[SERVER-31804] $expr rewrite optimization can lead to incorrect query results Created: 02/Nov/17 Updated: 30/Oct/23 Resolved: 06/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | 3.6.0-rc2 |
| Fix Version/s: | 3.6.0-rc3 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | David Storch | Assignee: | James Wahlin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Query 2017-11-13 | ||||||||
| Participants: | |||||||||
| Description |
|
The 3.6 stable release series adds support for $expr, a way to use the aggregation expression language within a $match predicate: https://docs.mongodb.com/master/reference/operator/query/expr/ The query engine optimizes $expr by attempting to fully or partially convert the $expr into a MatchExpression which can then be used to generate index bounds in the query planner. However, this rewrite optimization is incorrect if the $expr expresses a match over a path that contains any array whatsoever, regardless of whether the path is dotted. Consider the following example:
The essence of the problem is that the match language has implicit array traversal semantics whereas the aggregation expression language does not. {$gt: ["$a", 4]} and {a: {$gt: 4}} may look the same syntactically, but their meanings are not equivalent. |
| Comments |
| Comment by Githook User [ 06/Nov/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'username': 'jameswahlin', 'email': 'james@mongodb.com'}Message: To address correctness issues involving comparison of:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 02/Nov/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's not clear to me what $expr:{$gt:["$a", 4]} is supposed to mean so I'm changing the example to show how $eq uses an index without returning the possibly expected result but $in which is the correct way in agg to express "is this value in this array" does not rewrite (nor does it use an index, even though it could)
So it seems incorrect can be only $gt/$lt because of aggregation and find having different rules about comparisons across types. If there is a separate ticket for using index for the "$in" case, maybe my comment would be more relevant there - I can open a new ticket if there isn't already one. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 02/Nov/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Per in-person discussion with james.wahlin, it seems that there are even more problems of this nature:
But the biggest problem, which likely makes the $expr rewrite optimization impossible without strictly enforced schemas, is that match expressions, unlike agg expressions, do not have "type bracketing" behavior:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 02/Nov/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can see two ways to fix this:
I propose that we implement fix #1 under this ticket and pursue #2 under related ticket |