[SERVER-59146] Enable push down of $match with $expr Created: 05/Aug/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Militsa Sotirova Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-37530 Provide a way to cause a well-defined... Backlog
related to SERVER-58134 Map agg comparison predicates on time... Closed
is related to SERVER-58134 Map agg comparison predicates on time... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

Similar to MatchExpression splitting in an aggregation pipeline, we want to be able to split an $expr so that we can support use cases including pushdown in front of an unpack bucket stage if it has both meta fields and control fields as arguments. For example, an $expr that looks likeĀ 

{$expr: {$and: [{$eq: ["$meta.host", "host_0"]}, {$eq: ["$a", 0]}]}}

should be split into something likeĀ 

{$and: [{$expr: {$eq: ["$meta.host", "host_0"]}, {$expr: {$eq: ["$a", 0]}}]}



 Comments   
Comment by Bernard Gorman [ 19/Aug/21 ]

charlie.swanson: thanks for the heads-up. I'm satisfied to return this to the backlog - we've decided to do SERVER-59433 instead, which should give change streams optimization the ability to work around this issue, and ensure that all feasible expressions are pulled forward and rewritten.

Comment by Charlie Swanson [ 19/Aug/21 ]

Given the uncertainty around this change without a compelling reason to prioritize, I'm putting this on the backlog for now. We don't believe it to be very easy, but it's likely not that complex (maybe 1 or 2 weeks?). If you'd like to argue for its priority, please put it back into Needs Scheduling with your pitch!

Comment by Bernard Gorman [ 12/Aug/21 ]

Although it occurs to me that, since we're traversing the entire tree and removing predicates where we cannot rewrite them, we could probably get rid of the dependency on splitMatchByModifiedFields completely, and instead simply clone the entire $match and limit our rewrites to the relevant fields by passing a const std::set<std::string>& into rewriteMatchExpressionTree and rewriteAggExpressionTree. justin.seyster what do you think?

Comment by Bernard Gorman [ 12/Aug/21 ]

charlie.swanson: the concern here is that when doing the rewrites for change streams, which we have already completed for MatchExpression and are currently in the process of doing for $expr, we use splitMatchByModifiedFields and pass it the set of fields that are (1) relevant to the current filter that we're rewriting, and (2) we know how to perform rewrites for. This works fine in the case of MatchExpression, which can separate out predicates on individual fields. But if there's an $expr expression, then it's all-or-nothing; if there are any fields referenced in the $expr that are not present in the set of fields we request, we cannot split them apart.

Say for example there's a filter like so:

{$expr: {$and: [{$mod: ["$clusterTime", 3], 0}, {$in: ["$operationType", ["insert", "update"]]}]}}

... but we only request expressions on clusterTime. Despite the fact that it would be valid to split this $expr apart and move the clusterTime filter forward, the $expr cannot be split and therefore won't be pushed down into the oplog filter.

Comment by Charlie Swanson [ 12/Aug/21 ]

bernard.gorman also points out that this optimization would be useful in the context of some change streams rewrites - can you elaborate on what those are? Is it around predicates involving $tsSeconds which we expect to be popular?

Comment by Charlie Swanson [ 12/Aug/21 ]

Interesting points david.percy! cc christopher.harris this came up in some similar recent conversations around SBE and API Versioning.

Comment by David Percy [ 05/Aug/21 ]

This reminds me of SERVER-37530. We don't (intend to) guarantee that $and short-circuits, but people may assume this kind of idiom is valid:

db.stuff.find({$expr: {$and: [ {$isNumber: "$x"}, {$eq: [0, {$add: ["$x", 1]}]} ]}})

If $and doesn't short-circuit, then the $isNumber check doesn't prevent $add from erroring on non-number inputs.

ExpressionAnd::evaluate() does just happen to short circuit, so rewriting it to MatchExpression::AND could make the problem reveal itself more often.

Generated at Thu Feb 08 05:46:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.