[SERVER-37242] $group on sort key (after $sort) could be optimized to avoid blocking Created: 21/Sep/18  Updated: 06/Dec/22  Resolved: 21/Sep/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrew Ryder (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-4507 aggregation: optimize $group to take... Backlog
Assigned Teams:
Query
Participants:

 Description   

$group seems to be a blocking stage regardless of the conditions. Trivial groupings on keys which are already sorted can avoid blocking to some degree and definitely use less memory.

 

For example, using the "tweets" data set from the University with an index on "user.screen_name":

>db.tweets.aggregate( [ 
 { $sort: { "user.screen_name": 1 } },
 { $group: { _id: "$user.screen_name", tweets: { $push: "$$CURRENT" } } }
])

Fails with:

"Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in."

The results going in to the group are already sorted. Every new value consumed indicates the previous value can never be seen again – the previous value bucket could be emitted and the group begin a new bucket with no need to block.

Notably this would reduce the memory footprint of the $group stage to prevent it, in these cases, from ever exceeding the limit.

 



 Comments   
Comment by Andrew Ryder (Inactive) [ 24/Sep/18 ]

How embarrassing. I don't know how I missed that ticket in my search.

Comment by David Storch [ 21/Sep/18 ]

The "streaming $group" optimization is already tracked by SERVER-4507. Closing as a duplicate.

Generated at Thu Feb 08 04:45:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.