Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-85213

Rewrite $sort+$group with $first/$last to use $top/$bottom

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Integration
    • QI 2024-03-18, QI 2024-04-01, QI 2024-04-15, QI 2024-04-29, QI 2024-05-13

      Sometimes users will write queries of the form

      [{$sort: {x:1}},
       {$group: {_id: ..., first: {$first: "$y"}, last: {$last: "$y"}}}]

      Actually, performing a complete sort is wasteful, since we only need the first and last elements per each group, not the total order.

      We should rewrite these queries to the form, with no blocking sort:

      [{$group: {_id: ...,
                first: {$top: {sortBy: "$x", output: "$y"},
                last: {$bottom: {sortBy: "$x", output: "$y"}}}}]

      Update:
      In the rewritten query,
      1) the sortBy clause of $top/$topN and $bottom/$bottomN should be sort keys.
      2) the $first/$firstN should be replaced by $top/$topN group accumulator.
      3) the $last/$lastN should be replaced by $bottom/$bottomN group accumulator.
      4) the group-by keys do not need to be changed at all.
      5) $sort should be removed.

      Other requirements:

      1) This optimization should still work if other accumulators are present
      Update: because other accumulators' results do not depend at all on the sorted input which may span multiple groups.

      2) The optimization should work even if there's a stage between the sort and group, as long as it's correct to do so. (For example, a $addFields)
      Update: if there's any project-like stage in between, we need to track renames and dependencies for sort keys and group keys and referenced fields in group accumulators

      3) The optimization should work for compound group keys

      4) It should work for compound sort keys

      5) This optimization is often applicable to time series, but it is not TS-specific.

      Update:
      This example does not make much sense but it would be weird to not do this if a customer ever writes this pattern of query.
      6) $sort in $sort and $group {$top & $bottom} pattern is absorbed into $group if sort keys of $sort is same as the $top & $bottom's sortBy keys. For example,

      [
        {$sort: {a: 1}},
        {
          $group: {
            _id: ...,
            t: {$top: {sortBy: {a: 1}, output: ...}},
            b: {$bottom: {sortBy: {a: 1}, output:...}}
          }
        }
      ]
      

            Assignee:
            yoonsoo.kim@mongodb.com Yoon Soo Kim
            Reporter:
            ian.boros@mongodb.com Ian Boros
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: