[SERVER-66707] Add ability to nest $group operations Created: 23/May/22  Updated: 23/Feb/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: query-product-scope-3, query-product-urgency-3, query-product-value-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-66701 Optimize $addToSet accumulators into ... Backlog
is related to SERVER-29930 allow nested $facet in another $facet... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

Inspired by some similar capability in ElasticSearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html#run-sub-aggs 

Here's an example. Data set:

{_id: 1, party: "dem", state: "NY", metric: 10}
{_id: 2, party: "dem", state: "NY", metric: 14}
{_id: 3, party: "rep", state: "PA", metric: 16}
{_id: 4, party: "dem", state: "PA", metric: 12}
{_id: 5, party: "rep", state: "NY", metric: 8}
{_id: 6, party: "dem", state: "PA", metric: 9} 

I could run this aggregation:

{
  $group: {
    _id: "$state",
    stateTotalMetric: {$sum: "$metric"},
    count: {$sum: 1},
    byParty: {
      $group: {
        _id: "$party",
        count: {$sum: 1},
        totalMetric: {$sum: "$metric"}
      }
    }
  }
} 

And I would get this result:

{
  _id: "NY",
  stateTotalMetric: 32,
  count: 3,
  byParty: [
    {_id: "dem", count: 2, totalMetric: 24},
    {_id: "rep", count: 1, totalMetric: 8}
  ]
}
{
  _id: "PA",
  stateTotalMetric: 37,
  count: 3,
  byParty: [
    {_id: "dem", count: 2, totalMetric: 21},
    {_id: "rep", count: 1, totalMetric: 16}
  ]
}

This computation is obviously expressible today, but here's how you would have to do it:

{
  $group: {
    _id: {state: "$state", party: "$party},
    count: {$sum: 1},
    totalMetric: {$sum: "$metric"}
  }
},
{
  $group: {
    _id: "$_id.state",
    stateTotalMetric: {$sum: "$totalMetric"},
    count: {$sum: "$count"},
    byParty: {
      $push: {
        _id: "$_id.party",
        count: "$count",
        totalMetric: "$totalMetric"
      }
    }
  }
} 

Obviously, this is longer and more cumbersome. It also gets worse and more error prone if I had used $avg instead of $sum, where it would be incorrect or at least very different to take an average of averages, and I would have to track counts and sums independently in the first $group.



 Comments   
Comment by Kyle Suarez [ 17/Feb/23 ]

Thanks kateryna.kamenieva@mongodb.com, unassigning this from you.

Comment by Katya Kamenieva [ 17/Feb/23 ]

Interesting idea, but probably not too urgent. let's keep it in the backlog.

Comment by Charlie Swanson [ 26/May/22 ]

Also worth considering adding other $goup-like stages $bucket and $bucketAuto as accumulators.

Generated at Thu Feb 08 06:06:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.