[SERVER-59146] Enable push down of $match with $expr Created: 05/Aug/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Militsa Sotirova | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Similar to MatchExpression splitting in an aggregation pipeline, we want to be able to split an $expr so that we can support use cases including pushdown in front of an unpack bucket stage if it has both meta fields and control fields as arguments. For example, an $expr that looks likeĀ
should be split into something likeĀ
|
| Comments |
| Comment by Bernard Gorman [ 19/Aug/21 ] | |
|
charlie.swanson: thanks for the heads-up. I'm satisfied to return this to the backlog - we've decided to do | |
| Comment by Charlie Swanson [ 19/Aug/21 ] | |
|
Given the uncertainty around this change without a compelling reason to prioritize, I'm putting this on the backlog for now. We don't believe it to be very easy, but it's likely not that complex (maybe 1 or 2 weeks?). If you'd like to argue for its priority, please put it back into Needs Scheduling with your pitch! | |
| Comment by Bernard Gorman [ 12/Aug/21 ] | |
|
Although it occurs to me that, since we're traversing the entire tree and removing predicates where we cannot rewrite them, we could probably get rid of the dependency on splitMatchByModifiedFields completely, and instead simply clone the entire $match and limit our rewrites to the relevant fields by passing a const std::set<std::string>& into rewriteMatchExpressionTree and rewriteAggExpressionTree. justin.seyster what do you think? | |
| Comment by Bernard Gorman [ 12/Aug/21 ] | |
|
charlie.swanson: the concern here is that when doing the rewrites for change streams, which we have already completed for MatchExpression and are currently in the process of doing for $expr, we use splitMatchByModifiedFields and pass it the set of fields that are (1) relevant to the current filter that we're rewriting, and (2) we know how to perform rewrites for. This works fine in the case of MatchExpression, which can separate out predicates on individual fields. But if there's an $expr expression, then it's all-or-nothing; if there are any fields referenced in the $expr that are not present in the set of fields we request, we cannot split them apart. Say for example there's a filter like so:
... but we only request expressions on clusterTime. Despite the fact that it would be valid to split this $expr apart and move the clusterTime filter forward, the $expr cannot be split and therefore won't be pushed down into the oplog filter. | |
| Comment by Charlie Swanson [ 12/Aug/21 ] | |
|
bernard.gorman also points out that this optimization would be useful in the context of some change streams rewrites - can you elaborate on what those are? Is it around predicates involving $tsSeconds which we expect to be popular? | |
| Comment by Charlie Swanson [ 12/Aug/21 ] | |
|
Interesting points david.percy! cc christopher.harris this came up in some similar recent conversations around SBE and API Versioning. | |
| Comment by David Percy [ 05/Aug/21 ] | |
|
This reminds me of SERVER-37530. We don't (intend to) guarantee that $and short-circuits, but people may assume this kind of idiom is valid:
If $and doesn't short-circuit, then the $isNumber check doesn't prevent $add from erroring on non-number inputs. ExpressionAnd::evaluate() does just happen to short circuit, so rewriting it to MatchExpression::AND could make the problem reveal itself more often. |