[SERVER-37242] $group on sort key (after $sort) could be optimized to avoid blocking Created: 21/Sep/18 Updated: 06/Dec/22 Resolved: 21/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andrew Ryder (Inactive) | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query
|
||||||||
| Participants: | |||||||||
| Description |
|
$group seems to be a blocking stage regardless of the conditions. Trivial groupings on keys which are already sorted can avoid blocking to some degree and definitely use less memory.
For example, using the "tweets" data set from the University with an index on "user.screen_name":
Fails with: "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." The results going in to the group are already sorted. Every new value consumed indicates the previous value can never be seen again – the previous value bucket could be emitted and the group begin a new bucket with no need to block. Notably this would reduce the memory footprint of the $group stage to prevent it, in these cases, from ever exceeding the limit.
|
| Comments |
| Comment by Andrew Ryder (Inactive) [ 24/Sep/18 ] |
|
How embarrassing. I don't know how I missed that ticket in my search. |
| Comment by David Storch [ 21/Sep/18 ] |
|
The "streaming $group" optimization is already tracked by SERVER-4507. Closing as a duplicate. |