[SERVER-82570] Bucket-level filters in time-series are translated to SBE as trivially "true" Created: 30/Oct/23  Updated: 19/Nov/23  Resolved: 17/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-82899 Investigate SBE bucket level filters ... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

"c" is a measurement field.
Notice the filter stage for [1] – the filter is trivially true.

> db.ts.explain().aggregate([{$match: {c: 7}}, {$project: {name: 1}}]).queryPlanner.winningPlan.slotBasedPlan.stages
[3] project [s14 = makeBsonObj(MakeObjSpec(["_id", "name"], Closed, ReturnNothing), s13)]
[2] mkbson s13 [c = s11, _id = s10, name = s12] true false
[2] block_to_row blocks[s5, s6, s7] row[s10, s11, s12] s9
[2] project [s9 = cellFoldValues_F(valueBlockFillEmpty(valueBlockEqScalar(cellBlockGetFlatValuesBlock(s8), 7L), false), s8)]
[2] ts_bucket_to_cellblock s3 pathReqs[s5 = Get(_id)/Id, s6 = Get(c)/Id, s7 = Get(name)/Id, s8 = Get(c)/Traverse/Id]
[1] filter {true}
[1] scan s3 s4 none none none none none none lowPriority [s2 = control] @"c0689989-6714-4fbe-b85f-9cbe7482d3bc" true false

Sprint: QE 2023-11-13, QE 2023-11-27
Participants:
Linked BF Score: 135

 Description   

The $_internalExpr* family of comparison operators were initially introduced as a vehicle to enable index-based optimization while keeping the original comparison expressions intact. This means that, when the index optimizations didn't apply, it's safe to evaluate the internal comps as trivially true, and they were implemented as such in SBE for performance reasons.

The problem is that in time-series the internal comps are inserted at the bucket-level and don't have matching non-internal comps so, to achieve the goal of reducing the number of buckets to be unpacked, they must be implemented fully. SERVER-62058 attempted to do so but didn't take in account that the expressions might be serialized and then restored, thus, losing the mustExecute flag.

So far I've identified two places where the serialization roundtrip happens:
1. in DocumentSourceInternalUnpackBucket::doOptimizeAt, when creating DocumentSourceMatch from the loosePredicate. This serialization can be easily avoided by adding a create method on DocumentSourceMatch that would take unique_ptr to an expression rather than BSON (this serialization seems to be totally unnecessary)
2. in buildInnerQueryExecutorGeneric when the queryObj is extracted from the pipeline (const BSONObj queryObj = pipeline->getInitialQuery()). I'm not sure how to avoid this one...

Rather than trying to preserve the flag through serialization, should we look into adding a new family of internal comparison ops that are always evaluated? This way the rewrites for index-based optimizations can keep the trivially-implemented ones and time-series can opt in to use the ones that are always evaluated.



 Comments   
Comment by Githook User [ 17/Nov/23 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@mongodb.com', 'username': 'borosaurus'}

Message: SERVER-82570 Remove extraneous InternalExpr predicates during query planning
Branch: master
https://github.com/mongodb/mongo/commit/d8ab736a83e38c69768848a8713afa21c0c3b863

Comment by Ian Boros [ 07/Nov/23 ]

https://github.com/10gen/mongo/pull/16526

Generated at Thu Feb 08 06:49:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.