Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-86618

Investigate/fix an issue that the last project stage of $group produces an SBE object when it's supposed to produce BSON object

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Integration
    • ALL
    • Hide
      > db.a.insert([{a: 1}, {a: 2}, {a: 3}]);
      > db.a.explain().aggregate([{$group: {_id: "$a", o: {$sum: 1}}}, {$sort: {_id: 1}}]);
      
      Show
      > db.a.insert([{a: 1}, {a: 2}, {a: 3}]); > db.a.explain().aggregate([{$group: {_id: "$a" , o: {$sum: 1}}}, {$sort: {_id: 1}}]);
    • QI 2024-02-19

      I noticed that we create a project stage with newObj like project [sN = newObj("_id", sL, "o", sM)] for $group when $group is the last document source of the pipeline lowered to the SBE instead of generating BSON result. For example,

      > db.a.explain().aggregate([{$group: {_id: "$a", o: {$sum: 1}}}, {$sort: {_id: 1}}]).stages[0].$cursor.queryPlanner.winningPlan.slotBasedPlan.stages;
      [3] project [s7 = newObj("_id", s4, "o", s5)]
      [3] group [s4] [s5 = sum(1L)] spillSlots[s6] mergingExprs[sum(s6)]
      [3] project [s4 = (s1 ?: null)]
      [1] scan s2 s3 none none none none none none lowPriority [s1 = a] @"3e522460-3540-4d0c-aa9a-cc7995b8c494" true false
      

      In the above pipeline $group is the last source since $sort can’t be lowered.
      I noticed that this happens because of this code in QueryPlanner::extendWithAggPipeline().

              bool isLastSource =
                  (i + 1 == (innerPipelineStages.size())) && query.containsEntirePipeline();
      

      This actually works because SBE’s fetchNextImpl<BSONObj>() can handle SBE Object-typed result here.
      But my question is why do we consider a lowered source is the last one only when the entire pipeline is lowered? It seems to me that a lowered source is the last one simply when it’s the last one of lowered sources.

      To give some context, I’m investigating perf regression of timeseries queries with the new TCMalloc allocator and found that fetchNextImpl<BSONObj>()’s allocation due to conversion from the SBE Object to BSONObj accounts for 14% of entire allocations. We would allocate memory for the SBE Object once and then allocate another memory for the result BSONObj. So I want to avoid another BSONObj allocation here.

            Assignee:
            yoonsoo.kim@mongodb.com Yoon Soo Kim
            Reporter:
            yoonsoo.kim@mongodb.com Yoon Soo Kim
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: