[SERVER-62058] SBE for time-series will need $_internalExpr to be more selective Created: 14/Dec/21  Updated: 06/Dec/23  Resolved: 03/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Percy Assignee: Parker Felix
Resolution: Fixed Votes: 0
Labels: sbe
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-79699 Support wholeBucketFilter in the Unpa... Closed
Duplicate
is duplicated by SERVER-52745 Implement any MatchExpressions produc... Closed
Related
related to SERVER-52745 Implement any MatchExpressions produc... Closed
related to SERVER-60405 Support clustered collection querying... Backlog
related to SERVER-79699 Support wholeBucketFilter in the Unpa... Closed
Assigned Teams:
Query Integration
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QI 2023-09-04, QI 2023-09-18
Participants:

 Description   

I don't believe this is a correctness or performance bug, currently. I think it will be a performance bug in the future if we enable SBE on time-series collections.

------

I noticed SBE implements the $_internalExpr operators differently than classic:

> db.c.find({x: {$_internalExprEq: 6}})
{ "_id" : ObjectId("61aa7b92b996804af1df48c9") }
{ "_id" : ObjectId("61aa8eb9c9ca709bfcca03e4"), "x" : null }
{ "_id" : ObjectId("61aa8ebbc9ca709bfcca03e5"), "x" : undefined }
{ "_id" : ObjectId("61aa8efcc9ca709bfcca03e6"), "x" : 6 }
{ "_id" : ObjectId("61aa910b6f5ffffd0b22b2a0"), "x" : 1 }
{ "_id" : ObjectId("61aa910d6f5ffffd0b22b2a1"), "x" : 100 }
> db.c.find({x: {$_internalExprEq: 6}}).explain().queryPlanner.winningPlan.slotBasedPlan.stages
[1] filter {true} 
[1] scan s4 s5 none none none none [] @"25bb7ad1-c1c1-42fa-82cc-80e871af7bcb" true false 

In the classic engine only {x: 6} would match.
I noticed because in time-series rewrites, we use these operators to filter down the set of buckets before unpacking. We're relying on the specific behaviors documented here: https://github.com/mongodb/mongo/blob/93e01aad220ee1ab5ae809acd09426da46e028d0/src/mongo/db/matcher/expression_internal_expr_comparison.h#L36-L53

I don't think it's a problem yet, because it also looks like we don't generate SBE plans for time-series collections. But if we do plan to enable SBE+timeseries I think we'll need to change this.



 Comments   
Comment by Githook User [ 03/Oct/23 ]

Author:

{'name': 'Parker Felix', 'email': 'parker.felix@mongodb.com', 'username': 'parker-felix'}

Message: SERVER-62058 Implement $_internalExpr in SBE
Branch: master
https://github.com/mongodb/mongo/commit/027de410329bcfd13449bebb6cbee43f8524644c

Comment by Irina Yatsenko (Inactive) [ 05/Sep/23 ]

gil.alon@mongodb.com, you've been looking at the TS rewrites recently and this ticket is related to them. As Yoonsoo mentioned above, there is an implementation of the internal comp ops in the block processing spike branch: https://github.com/10gen/mongo/commit/7dc5f06c51177da7fb4b9fc748b8befd96382cdb#diff-505b6901656a3b7a189eb468f202966c69a3dafd3004fef097dbd4fd3089f449 – we should be able to take it pretty much "as is". In addition to that we should add a perf workload that relies on the bucket filtering to be done efficiently (I suspect that all TSBS workloads end up using an index so the lack of support of $_internalExpr* in SBE doesn't affect them)

Comment by Irina Yatsenko (Inactive) [ 05/Sep/23 ]

The ops from the $_internalExpr* family are meant to be non-type-bracketing because they are used to rewrite corresponding non-internal ops used in $expr, which is non-type bracketing. The rewrites of $expr into "plain" $match are done to enable use of indexes. The rewritten terms are prepended to the original match expression with $expr so that we get a more complicated but equivalent filter (with the potential benefit of the non-$expr terms using the indexes). Thus, the trivial implementation of the ops in SBE (the filters always return "true") doesn't affect non-TS cases: either the index is used and the internal expression sets the bounds correctly or the index isn't used in which case the original expression is still applied.

The TS rewrites are different in the sense that they might replace the original filter with the ones at the bucket level (if the "wholeBucketFilter" is used) or rely on the bucket level filters even in absence of indexes on the corresponding fields. The replacement might produce incorrect results (as the rewritten filters are more permissive), but even if wholeBucketFilter isn't used, the lack of filtering at the bucket level would be super inefficient.

Action items:
1. provide a correct (that is, matching to the classic) implementation of $_internalExpr* ops
2. consider not using $_internalExpr* for rewrites of non-$expr match

Note: doing p2 doesn't remove the necessity of doing p1 because we still want to push to bucket level as many parts of $expr as possible

Comment by Kyle Suarez [ 29/Mar/23 ]

After discussion with james.wahlin@mongodb.com, adding this to PM-1946.

Comment by David Percy [ 21/Dec/21 ]

It looks like PM-2235 is the project that includes enabling SBE on time-series.

Comment by Joe Kanaan [ 21/Dec/21 ]

david.percy, please feel free to move this to a more appropriate epic if you deem fit.

Generated at Thu Feb 08 05:54:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.