Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66707

Add ability to nest $group operations

    • Query Optimization

      Inspired by some similar capability in ElasticSearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html#run-sub-aggs 

      Here's an example. Data set:

      {_id: 1, party: "dem", state: "NY", metric: 10}
      {_id: 2, party: "dem", state: "NY", metric: 14}
      {_id: 3, party: "rep", state: "PA", metric: 16}
      {_id: 4, party: "dem", state: "PA", metric: 12}
      {_id: 5, party: "rep", state: "NY", metric: 8}
      {_id: 6, party: "dem", state: "PA", metric: 9} 

      I could run this aggregation:

      {
        $group: {
          _id: "$state",
          stateTotalMetric: {$sum: "$metric"},
          count: {$sum: 1},
          byParty: {
            $group: {
              _id: "$party",
              count: {$sum: 1},
              totalMetric: {$sum: "$metric"}
            }
          }
        }
      } 

      And I would get this result:

      {
        _id: "NY",
        stateTotalMetric: 32,
        count: 3,
        byParty: [
          {_id: "dem", count: 2, totalMetric: 24},
          {_id: "rep", count: 1, totalMetric: 8}
        ]
      }
      {
        _id: "PA",
        stateTotalMetric: 37,
        count: 3,
        byParty: [
          {_id: "dem", count: 2, totalMetric: 21},
          {_id: "rep", count: 1, totalMetric: 16}
        ]
      }

      This computation is obviously expressible today, but here's how you would have to do it:

      {
        $group: {
          _id: {state: "$state", party: "$party},
          count: {$sum: 1},
          totalMetric: {$sum: "$metric"}
        }
      },
      {
        $group: {
          _id: "$_id.state",
          stateTotalMetric: {$sum: "$totalMetric"},
          count: {$sum: "$count"},
          byParty: {
            $push: {
              _id: "$_id.party",
              count: "$count",
              totalMetric: "$totalMetric"
            }
          }
        }
      } 

      Obviously, this is longer and more cumbersome. It also gets worse and more error prone if I had used $avg instead of $sum, where it would be incorrect or at least very different to take an average of averages, and I would have to track counts and sums independently in the first $group.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            charlie.swanson@mongodb.com Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: