[SERVER-31804] $expr rewrite optimization can lead to incorrect query results Created: 02/Nov/17  Updated: 30/Oct/23  Resolved: 06/Nov/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: 3.6.0-rc2
Fix Version/s: 3.6.0-rc3

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: James Wahlin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-31760 Lookup sub-pipeline is not using inde... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2017-11-13
Participants:

 Description   

The 3.6 stable release series adds support for $expr, a way to use the aggregation expression language within a $match predicate:

https://docs.mongodb.com/master/reference/operator/query/expr/

The query engine optimizes $expr by attempting to fully or partially convert the $expr into a MatchExpression which can then be used to generate index bounds in the query planner. However, this rewrite optimization is incorrect if the $expr expresses a match over a path that contains any array whatsoever, regardless of whether the path is dotted. Consider the following example:

> db.c.drop();
> db.c.insert({a: [1, 2, 3]});
> db.c.find({$expr: {$gt: ["$a", 4]}});
// No results. This is incorrect because the array [1, 2, 3] is considered greater than
// the number 4 in the aggregation language. In fact, all arrays are considered greater
// than all numbers in aggregation, since the first part of the comparison is the
// canonical type.
 
// The explain shows that the match expression {a: {$gt: 4}} has been generated by
// the $expr rewrite optimization.
> db.c.find({$expr: {$gt: ["$a", 4]}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"a" : {
						"$gt" : 4
					}
				},
				{
					"$expr" : {
						"$gt" : [
							"$a",
							{
								"$const" : 4
							}
						]
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"a" : {
							"$gt" : 4
						}
					},
					{
						"$expr" : {
							"$gt" : [
								"$a",
								{
									"$const" : 4
								}
							]
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "storchbox",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}

The essence of the problem is that the match language has implicit array traversal semantics whereas the aggregation expression language does not. {$gt: ["$a", 4]} and {a: {$gt: 4}} may look the same syntactically, but their meanings are not equivalent.



 Comments   
Comment by Githook User [ 06/Nov/17 ]

Author:

{'name': 'James Wahlin', 'username': 'jameswahlin', 'email': 'james@mongodb.com'}

Message: SERVER-31804 Disable $expr rewrite optimization

To address correctness issues involving comparison of:

Comment by Asya Kamsky [ 02/Nov/17 ]

It's not clear to me what $expr:{$gt:["$a", 4]} is supposed to mean so I'm changing the example to show how $eq uses an index without returning the possibly expected result but $in which is the correct way in agg to express "is this value in this array" does not rewrite (nor does it use an index, even though it could)

db.c.explain().find({$expr: {$in: [1, "$a"]}})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$expr" : {
				"$in" : [
					{
						"$const" : 1
					},
					"$a"
				]
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$expr" : {
					"$in" : [
						{
							"$const" : 1
						},
						"$a"
					]
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Asyas-MacBook-Pro.local",
		"port" : 27017,
		"version" : "3.6.0-rc1",
		"gitVersion" : "979ee612682b77d9cabaafae10787fbb578cd32a"
	},
	"ok" : 1
}
db.c.find({$expr: {$in: [1, "$a"]}});
{ "_id" : ObjectId("59fb76e31881e0be82b16b18"), "a" : [ 1, 2, 3 ] }
db.c.find({$expr: {$eq: [1, "$a"]}});
// no result
db.c.explain().find({$expr: {$eq: [1, "$a"]}});
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"a" : {
						"$eq" : 1
					}
				},
				{
					"$expr" : {
						"$eq" : [
							{
								"$const" : 1
							},
							"$a"
						]
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$expr" : {
					"$eq" : [
						{
							"$const" : 1
						},
						"$a"
					]
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"a" : 1
				},
				"indexName" : "a_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"a" : [
						"a"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[1.0, 1.0]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Asyas-MacBook-Pro.local",
		"port" : 27017,
		"version" : "3.6.0-rc1",
		"gitVersion" : "979ee612682b77d9cabaafae10787fbb578cd32a"
	},
	"ok" : 1
}

So it seems incorrect can be only $gt/$lt because of aggregation and find having different rules about comparisons across types.

If there is a separate ticket for using index for the "$in" case, maybe my comment would be more relevant there - I can open a new ticket if there isn't already one.

Comment by David Storch [ 02/Nov/17 ]

Per in-person discussion with james.wahlin, it seems that there are even more problems of this nature:

  • Comparison to null queries.
  • Comparison to undefined queries.
  • Comparison to NaN queries.

But the biggest problem, which likely makes the $expr rewrite optimization impossible without strictly enforced schemas, is that match expressions, unlike agg expressions, do not have "type bracketing" behavior:

> db.c.find()
{ "_id" : ObjectId("59fb44172cfe03501f22015f"), "a" : 1 }
{ "_id" : ObjectId("59fb44192cfe03501f220160"), "a" : "str" }
{ "_id" : ObjectId("59fb44202cfe03501f220161"), "a" : ISODate("2017-11-02T16:13:20.367Z") }
> db.c.aggregate([{$project: {foo: {$lt: ["$a", /foo/]}}}])
{ "_id" : ObjectId("59fb44172cfe03501f22015f"), "foo" : true }
{ "_id" : ObjectId("59fb44192cfe03501f220160"), "foo" : true }
{ "_id" : ObjectId("59fb44202cfe03501f220161"), "foo" : true }
> db.c.find({$expr: {$lt: ["$a", /foo/]}})
// No results. This is inconsistent with the result of the $project above.

Comment by David Storch [ 02/Nov/17 ]

I can see two ways to fix this:

  1. Disable the $expr rewrite optimization.
  2. Use path-level multikeyness metadata to inform whether a rewrite is correct.

I propose that we implement fix #1 under this ticket and pursue #2 under related ticket SERVER-31760.

Generated at Thu Feb 08 04:28:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.