[SERVER-66701] Optimize $addToSet accumulators into $group stages where possible Created: 23/May/22  Updated: 31/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-66707 Add ability to nest $group operations Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

I have seen a number of customer queries use $addToSet in scenarios that it is not necessary for and is in fact a bad choice due to it's materialization of a giant array. For example:

 [
  {
    "$match": {
      "myField.target": "TARGET"
    }
  },
  {
    "$group": {
      "_id": {},
      "value": {
        "$addToSet": "$metadata.something_id"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "value": {
        "$size": "$value"
      }
    }
  }
]

This query only needs the size of the resulting "value" array, so could be re-written like so:

[
  {
    "$match": {
      "metadata.target": "cloud"
    }
  },
  {
    "$group": {
      "_id": "$metadata.something_id"
    }
  },
  {
     "$group": {
       "_id": {},
       value: {$sum: 1}
     }
  },
  {
    "$project": {
      "_id": 0
    }
  }
] 

 


Generated at Thu Feb 08 06:06:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.