Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88613

Introduce a new syntax to $group to define the common sort key for multiple $top & $bottom accumulators

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Query Optimization

      This is an idea to improve SERVER-88087 optimization further but benefits do not stop at just optimization and this provides users with a clear way to express the common sort pattern across multiple $top/$bottom.

      The motivation is to avoid generating the same sort key multiple times for multiple accumulators that use the same sort key.

      In SERVER-88087, we group $top(N)/$bottom(N) with the same sort pattern into either one $top, $topN, $bottom, or $bottomN due to the current limitation of $group specification. So, in the worst case, we need to generate the same sort key 4 times though they are the same sort key.
      Update: in the worst case, it's actually unbounded because for $topN/$bottomN, "n" argument should be part of grouping key.

      If $group was able to define the common sort key for multiple $top(N)/$bottom(N) accumulators using a new syntax, we could generate the sort key only once and let $top(N) and $bottom(N) refer to the sort key.

      Off the top of my head, I could think of this syntax.

      {
        $group: {
          sortKeys: {k1: {time: 1, tag: -1}},
          _id: ...,
          tm: {$top: {sortBy: "$$k1", output: "$m"}},
          bi: {$bottom: {sortBy: "$$k1", output: "$i"}}
        }
      }
      
      // This is equivalent to the following syntax
      {
        $group: {
          _id: ...,
          tm: {$top: {sortBy: {time: 1, tag: -1}, output: "$m"}},
          bi: {$bottom: {sortBy: {time: 1, tag: -1}, output: "$i"}}
        }
      }
      

      Basically, the common sort pattern itself does not need to be a part of accumulator's spec. There could be multiple sort keys for different $tops and $bottoms and so we need a new syntax to support multiple sort keys. Referring to the defined sort keys can be expressed by prefixing the "$$" to a defined sort key.

      The SERVER-88087 could have leveraged this syntax.

      Any idea for the better syntax will be welcomed.

      I think this is a small to medium size project since we need to

      1. define a new syntax for $group
      2. support the new syntax in the classic pipeline
      3. support the new syntax in the SBE group
      4. support the new syntax in the SBE block group
      5. apply this new syntax to SERVER-88087 optimization

      Benefits of this proposal are

      1. By exposing this syntax to users, we can encourage users to write optimized $group queries. users can express their intention more clearly when there are shared common sort pattern across multiple $top/$bottoms. This is the most important benefit and motivation.
      2. We can avoid the overhead of generating the same sort key multiple times completely, which has been found to be big.
      3. It's a common timeseries query pattern to have $sort + $group w/ $first/$last which can be optimized into $group w/ $top/$bottom. This proposal will optimize the pattern further.

            Assignee:
            kateryna.kamenieva@mongodb.com Katya Kamenieva
            Reporter:
            yoonsoo.kim@mongodb.com Yoon Soo Kim
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: