[SERVER-31360] MatchExpression::getOptimizer() for $or/$and should collapse equivalent clauses Created: 03/Oct/17 Updated: 19/Dec/23 Resolved: 19/Dec/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.4.9 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Oskari Petas | Assignee: | Backlog - Query Optimization |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | asya | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
The `$or` operator produces suboptimal results if the parameters in the list are exactly the same, at least when they're empty: `db.collection.count({$or: [ {}, {} ] })` The way I see it the query planner should be able to see that the parameters are exactly the same and simplify that. The query `db.collection.count({$or: [ {} ] })` produces a single COUNT stage whereas the more complex query with two empty params creates a SUBPLAN that uses COLLSCAN and only then is able to COUNT so the query is a lot slower. |
| Comments |
| Comment by Ana Meza [ 18/Apr/23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
alexander.ignatyev@mongodb.com when you finish | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nicholas Zolnierz [ 08/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've retested this after the optimizations from The expression {$or: [ { } ]} will push up it's only child whereas the expression {$or: [ {}, {} ]} will fall through to this optimization and result in {$alwaysTrue: 1}. Unfortunately the planner does not have handling for the $alwaysTrue match expression so it still results in a COLLSCAN with a COUNT stage.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 25/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi petazz, I think there are actually two distinct improvement requests related to your observation. The first request is that the MongoDB query planner simplify redundant branches of an $or. This is more interesting when the branches are not empty. Consider the following example:
You could imagine that a naive implementation would construct an OR plan, where each branch of the plan used an identical index scan over the index {a: 1}. This would be wasteful, since the plan would unnecessarily perform the same index scan twice. However, the query engine is smart enough to optimize away redundant branches of an OR plan. The explain output for this query indicates that there is just one IXSCAN in the winning plan:
This behavior was implemented in 15c72c857 as part of However, you will notice that the redundant branches of the match are not optimized away in the case of a COLLSCAN:
You can see that the filter field associated with the COLLSCAN stage does not contain a simplified $or expression. This probably isn't a real performance issue, since $or predicate evaluation will short-circuit. Nevertheless, I think it would still be a valuable improvement to extend our MatchExpression optimization code to collapse equivalent branches of an $and or $or. I am going to convert this ticket to be about tracking this work. The second related improvement request is specific to empty $or clauses, and the fact that they inhibit a fast COUNT plan. Since an empty clause matches all documents, an $or with an empty clause can simplify to an {$alwaysTrue: 1} predicate. This optimization would allow your query to use a fast COUNT plan with no COLLSCAN. This improvement is already being tracked by Hopefully this explanation makes sense, and let me know if you have any further questions. Best, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Agarunov [ 03/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello petazz, After taking another look at this, I was incorrect that this is a duplicate of SERVER-18777. I've reopened this ticket and set the fixVersion to 'Needs Triage' to be planned against our currently scheduled work. Updates will be posted on this ticket as they are available. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Agarunov [ 03/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello petazz, Thank you for the report. Looking over this, the underlying cause of the poor performance seems to be due to the SUBPLAN stage not properly evicting cached plans, as described in SERVER-18777. As the behavior described appears to have the same cause, I've closed this ticket as a duplicate, please follow SERVER-18777 for updates on this issue. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oskari Petas [ 03/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Argh forgot to change the type to improvement |