Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82570

Bucket-level filters in time-series are translated to SBE as trivially "true"

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Hide

      "c" is a measurement field.
      Notice the filter stage for [1] – the filter is trivially true.

      > db.ts.explain().aggregate([{$match: {c: 7}}, {$project: {name: 1}}]).queryPlanner.winningPlan.slotBasedPlan.stages
      [3] project [s14 = makeBsonObj(MakeObjSpec(["_id", "name"], Closed, ReturnNothing), s13)]
      [2] mkbson s13 [c = s11, _id = s10, name = s12] true false
      [2] block_to_row blocks[s5, s6, s7] row[s10, s11, s12] s9
      [2] project [s9 = cellFoldValues_F(valueBlockFillEmpty(valueBlockEqScalar(cellBlockGetFlatValuesBlock(s8), 7L), false), s8)]
      [2] ts_bucket_to_cellblock s3 pathReqs[s5 = Get(_id)/Id, s6 = Get(c)/Id, s7 = Get(name)/Id, s8 = Get(c)/Traverse/Id]
      [1] filter {true}
      [1] scan s3 s4 none none none none none none lowPriority [s2 = control] @"c0689989-6714-4fbe-b85f-9cbe7482d3bc" true false
      
      Show
      "c" is a measurement field. Notice the filter stage for [1] – the filter is trivially true. > db.ts.explain().aggregate([{$match: {c: 7}}, {$project: {name: 1}}]).queryPlanner.winningPlan.slotBasedPlan.stages [3] project [s14 = makeBsonObj(MakeObjSpec([ "_id" , "name" ], Closed, ReturnNothing), s13)] [2] mkbson s13 [c = s11, _id = s10, name = s12] true false [2] block_to_row blocks[s5, s6, s7] row[s10, s11, s12] s9 [2] project [s9 = cellFoldValues_F(valueBlockFillEmpty(valueBlockEqScalar(cellBlockGetFlatValuesBlock(s8), 7L), false ), s8)] [2] ts_bucket_to_cellblock s3 pathReqs[s5 = Get(_id)/Id, s6 = Get(c)/Id, s7 = Get(name)/Id, s8 = Get(c)/Traverse/Id] [1] filter { true } [1] scan s3 s4 none none none none none none lowPriority [s2 = control] @ "c0689989-6714-4fbe-b85f-9cbe7482d3bc" true false
    • QE 2023-11-13, QE 2023-11-27
    • 135

      The $_internalExpr* family of comparison operators were initially introduced as a vehicle to enable index-based optimization while keeping the original comparison expressions intact. This means that, when the index optimizations didn't apply, it's safe to evaluate the internal comps as trivially true, and they were implemented as such in SBE for performance reasons.

      The problem is that in time-series the internal comps are inserted at the bucket-level and don't have matching non-internal comps so, to achieve the goal of reducing the number of buckets to be unpacked, they must be implemented fully. SERVER-62058 attempted to do so but didn't take in account that the expressions might be serialized and then restored, thus, losing the mustExecute flag.

      So far I've identified two places where the serialization roundtrip happens:
      1. in DocumentSourceInternalUnpackBucket::doOptimizeAt, when creating DocumentSourceMatch from the loosePredicate. This serialization can be easily avoided by adding a create method on DocumentSourceMatch that would take unique_ptr to an expression rather than BSON (this serialization seems to be totally unnecessary)
      2. in buildInnerQueryExecutorGeneric when the queryObj is extracted from the pipeline (const BSONObj queryObj = pipeline->getInitialQuery()). I'm not sure how to avoid this one...

      Rather than trying to preserve the flag through serialization, should we look into adding a new family of internal comparison ops that are always evaluated? This way the rewrites for index-based optimizations can keep the trivially-implemented ones and time-series can opt in to use the ones that are always evaluated.

            Assignee:
            ian.boros@mongodb.com Ian Boros
            Reporter:
            irina.yatsenko@mongodb.com Irina Yatsenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: