[SERVER-9377] Allow collecting "top" N values for each group Created: 17/Apr/13 Updated: 16/Sep/22 Resolved: 06/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Katya Kamenieva |
| Resolution: | Done | Votes: | 77 |
| Labels: | accumulator, expression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Sprint: | QE 2021-10-18, QE 2021-11-29, QE 2021-12-13 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
Issue Status as of Jan 6, 2022 Summary
Examples
Output
Output:
Versions Original Description Syntax
Examples
Notes
Errors
Previous Description: Analogous to {$group:{_id:"$key", maxval:{$max:"$val"}}} if user needs to gather top N values per key (most recent, highest N, etc) to have ability to do equivalent of {$max:"$val",$limit:5} or $push:{$sort:...,$limit:N} type of idea. |
| Comments |
| Comment by Katya Kamenieva [ 06/Jan/22 ] | |||||||||||||||||||||||||||||||||||||||||
|
This will be included in the Atlas Rapid Release version 5.2.0 and will be available in the LTS version 6.0. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 02/Apr/20 ] | |||||||||||||||||||||||||||||||||||||||||
|
I just realized there is a workaround that can be used that depending on the number of keys being grouped could be much better than pushing everything into one array and then $slicing it. Assuming a collection called scores which has an index on game:1, score:-1 and a typical documents looking like this:
To get a leaderboard for three specific games run this aggregation:
To get the same for all the games just remove the $match
In my test dataset which has 100 games, and 1.13 million total documents, the above aggregation for all games runs in 39ms, for a subset of games in single digits it runs in 2-3ms. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 31/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||
|
ashutosh.mimani@inspirock.com there are no obvious workaround that can do this in a single aggregation pipeline - I imagine this could be implemented using several queries/aggregations, but I haven't been able to come up with a general solution in a single pipeline. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Ashutosh Mimani [ 19/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||
|
I can't use $push followed by $slice as that hits 100mb memory limit. This accumulator would have been a great solution for the problem. Are there known workarounds to not having this? | |||||||||||||||||||||||||||||||||||||||||
| Comment by DANIELE Tassone [ 10/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||
|
Just come up on this issue today - I will really would be happy to have it. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Benoit Labergri [ 29/Jan/18 ] | |||||||||||||||||||||||||||||||||||||||||
|
I am also worried about performances. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Ashutosh Mittal [ 27/Jan/18 ] | |||||||||||||||||||||||||||||||||||||||||
|
but if the number of records is in millions, but i want only 10 records in each group is it gonna cause a problem? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 25/Jan/18 ] | |||||||||||||||||||||||||||||||||||||||||
|
ashuSvirus if the number of records of each grouping is not too high you can $sort and $push (without limit) and then use $slice expression in the next stage to trim the result to desired number of remaining (top) values. Using example from description to get top 3 leaderboard per game:
While sorting on game isn't necessary, I'm making an assumption there may be an index to support that sort and it would also reduce resource usage during grouping. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Ashutosh Mittal [ 20/Jan/18 ] | |||||||||||||||||||||||||||||||||||||||||
|
is there any workaround for this one without performing n+1 loop? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 20/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||
|
We are still tracking this as a highly desired feature, but we have not yet decided what version it will be implemented in. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Sachin Takkar [ 13/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||
|
Please comment if there are any plans to do this in future. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Jagadish [ 15/May/15 ] | |||||||||||||||||||||||||||||||||||||||||
|
Are there any plans to do it? When will this feature be added? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Peter [ 30/Apr/14 ] | |||||||||||||||||||||||||||||||||||||||||
|
It's really helpful for that guys who transfer from Relationship-DBMS. | |||||||||||||||||||||||||||||||||||||||||
| Comment by v s [ 14/Apr/14 ] | |||||||||||||||||||||||||||||||||||||||||
|
+1 | |||||||||||||||||||||||||||||||||||||||||
| Comment by Kamesh [ 05/Dec/13 ] | |||||||||||||||||||||||||||||||||||||||||
|
Any idea on, in which version this will be fixed? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Ennamorato [ 25/Jul/13 ] | |||||||||||||||||||||||||||||||||||||||||
|
This would definitely be useful to have. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Flavien [ 24/Jul/13 ] | |||||||||||||||||||||||||||||||||||||||||
|
Is there an ETA for triaging this? Our planning depends on whether and when this gets implemented. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Flavien [ 24/Jul/13 ] | |||||||||||||||||||||||||||||||||||||||||
|
I would very much like that, right now I have to issue 60 different queries to get the top 10 of various groups, when only one would be needed if we had this feature. |