[SERVER-31360] MatchExpression::getOptimizer() for $or/$and should collapse equivalent clauses Created: 03/Oct/17  Updated: 19/Dec/23  Resolved: 19/Dec/23

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.4.9
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Oskari Petas Assignee: Backlog - Query Optimization
Resolution: Duplicate Votes: 0
Labels: asya
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-75079 Simplify boolean expressions before f... Closed
Related
related to SERVER-60373 Duplicate predicates in query plan fo... Closed
is related to SERVER-31639 Additional optimizations for MatchExp... Closed
is related to SERVER-75079 Simplify boolean expressions before f... Closed
is related to SERVER-49274 Coalesce scans of the same index in p... Backlog
is related to SERVER-22857 eliminate redundant conditions/clause... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

The `$or` operator produces suboptimal results if the parameters in the list are exactly the same, at least when they're empty: `db.collection.count({$or: [ {}, {} ] })`

The way I see it the query planner should be able to see that the parameters are exactly the same and simplify that.

The query `db.collection.count({$or: [ {} ] })` produces a single COUNT stage whereas the more complex query with two empty params creates a SUBPLAN that uses COLLSCAN and only then is able to COUNT so the query is a lot slower.



 Comments   
Comment by Ana Meza [ 18/Apr/23 ]

alexander.ignatyev@mongodb.com when you finish SERVER-75079, please review the tickets linked in this ticket

Comment by Nicholas Zolnierz [ 08/Jan/18 ]

I've retested this after the optimizations from SERVER-31639, and the behavior is unchanged w.r.t to the fast COUNT plan.

The expression {$or: [ { } ]} will push up it's only child whereas the expression {$or: [ {}, {} ]} will fall through to this optimization and result in {$alwaysTrue: 1}. Unfortunately the planner does not have handling for the $alwaysTrue match expression so it still results in a COLLSCAN with a COUNT stage.

> db.test.explain().count({$or: [{}]})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"winningPlan" : {
			"stage" : "COUNT"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "nz_desktop",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}
> db.test.explain().count({$or: [{}, {}]})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$alwaysTrue" : 1
		},
		"winningPlan" : {
			"stage" : "COUNT",
			"inputStage" : {
				"stage" : "COLLSCAN",
				"filter" : {
					"$alwaysTrue" : 1
				},
				"direction" : "forward"
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "nz_desktop",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}
> db.test.explain().count({$alwaysTrue: 1})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$alwaysTrue" : 1
		},
		"winningPlan" : {
			"stage" : "COUNT",
			"inputStage" : {
				"stage" : "COLLSCAN",
				"filter" : {
					"$alwaysTrue" : 1
				},
				"direction" : "forward"
			}
		},
		"rejectedPlans" : [ ]
	},
}

Comment by David Storch [ 25/Oct/17 ]

Hi petazz,

I think there are actually two distinct improvement requests related to your observation. The first request is that the MongoDB query planner simplify redundant branches of an $or. This is more interesting when the branches are not empty. Consider the following example:

> db.c.drop();
true
> db.c.createIndex({a: 1});
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.c.find({$or: [{a: {$lt: 0}}, {a: {$lt: 0}}]});

You could imagine that a naive implementation would construct an OR plan, where each branch of the plan used an identical index scan over the index {a: 1}. This would be wasteful, since the plan would unnecessarily perform the same index scan twice. However, the query engine is smart enough to optimize away redundant branches of an OR plan. The explain output for this query indicates that there is just one IXSCAN in the winning plan:

> db.c.explain().find({$or: [{a: {$lt: 0}}, {a: {$lt: 0}}]});
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$or" : [
				{
					"a" : {
						"$lt" : 0
					}
				},
				{
					"a" : {
						"$lt" : 0
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "SUBPLAN",
			"inputStage" : {
				"stage" : "FETCH",
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"a" : 1
					},
					"indexName" : "a_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"a" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"a" : [
							"[-inf.0, 0.0)"
						]
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "dstorch",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}

This behavior was implemented in 15c72c857 as part of SERVER-13732.

However, you will notice that the redundant branches of the match are not optimized away in the case of a COLLSCAN:

> db.c.explain().find({$or: [{b: {$lt: 0}}, {b: {$lt: 0}}]});
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$or" : [
				{
					"b" : {
						"$lt" : 0
					}
				},
				{
					"b" : {
						"$lt" : 0
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "SUBPLAN",
			"inputStage" : {
				"stage" : "COLLSCAN",
				"filter" : {
					"$or" : [
						{
							"b" : {
								"$lt" : 0
							}
						},
						{
							"b" : {
								"$lt" : 0
							}
						}
					]
				},
				"direction" : "forward"
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "dstorch",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}

You can see that the filter field associated with the COLLSCAN stage does not contain a simplified $or expression. This probably isn't a real performance issue, since $or predicate evaluation will short-circuit. Nevertheless, I think it would still be a valuable improvement to extend our MatchExpression optimization code to collapse equivalent branches of an $and or $or. I am going to convert this ticket to be about tracking this work.

The second related improvement request is specific to empty $or clauses, and the fact that they inhibit a fast COUNT plan. Since an empty clause matches all documents, an $or with an empty clause can simplify to an {$alwaysTrue: 1} predicate. This optimization would allow your query to use a fast COUNT plan with no COLLSCAN. This improvement is already being tracked by SERVER-31639, which I have marked as related to this ticket.

Hopefully this explanation makes sense, and let me know if you have any further questions.

Best,
Dave

Comment by Mark Agarunov [ 03/Oct/17 ]

Hello petazz,

After taking another look at this, I was incorrect that this is a duplicate of SERVER-18777. I've reopened this ticket and set the fixVersion to 'Needs Triage' to be planned against our currently scheduled work. Updates will be posted on this ticket as they are available.

Thanks,
Mark

Comment by Mark Agarunov [ 03/Oct/17 ]

Hello petazz,

Thank you for the report. Looking over this, the underlying cause of the poor performance seems to be due to the SUBPLAN stage not properly evicting cached plans, as described in SERVER-18777. As the behavior described appears to have the same cause, I've closed this ticket as a duplicate, please follow SERVER-18777 for updates on this issue.

Thanks,
Mark

Comment by Oskari Petas [ 03/Oct/17 ]

Argh forgot to change the type to improvement

Generated at Thu Feb 08 04:26:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.