[SERVER-24799] $group aggregation command should maintain document order Created: 24/Jun/16 Updated: 06/Dec/22 Resolved: 17/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Billy Tetrud | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I just wasted a ton of time before realizing that $group doesn't group by the order of input documents. This is incredibly counter-intuitive and makes the already-complicated aggregation more complex to understand. I don't see why $group shouldn't guarantee input order, and it would be very helpful if it did. |
| Comments |
| Comment by Wojciech Waga [ 18/Oct/21 ] | |||||||||||||||||||||||
|
@Asya Kamsky I just went into the same issue and although being intuitive or not is subjective we are running into objective problem as well - by ignoring data being ordered by the same key (with proper index and sort on that key) group becomes a non-streaming op which makes it subject to 100MB limit and gives worse experience and perrofmance in some cases. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 09/Jul/20 ] | |||||||||||||||||||||||
|
Adding a note here since this ticket was linked as example of something that's a misunderstanding: $group always honors the order the documents are arriving into the $group stage, so if you $sort and then $group with $first, $last, $push etc then the order very much matters and will make a difference to each document within $group. What is never guaranteed (at least at this point) is that after the group stage the entire stream of documents would be ordered by "something". So if you want it ordered by group _id field, you have to sort after $group. If you want it sorted by some other field, you have to $sort by that after $group. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 17/Jan/18 ] | |||||||||||||||||||||||
|
We haven't been able to determine a use case that requires new functionality so I'm closing this as Won't Fix. If you think that's a mistake, please provide an example with input documents, pipeline and desired output so we can see how the requested behavior would work. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 24/Oct/16 ] | |||||||||||||||||||||||
|
An array can be unwound emitting index of each element along with value. Starting with 3.4 arrays can also be $zip'ed together with a generated sequence. It's not really clear to me why the "roads" example needs $group to maintain document order as you didn't describe what objective $group is fulfilling. This type of MongoDB-related support discussion would be best posted on the mongodb-user group. Asya | |||||||||||||||||||||||
| Comment by Andrea Di Cesare [ 24/Oct/16 ] | |||||||||||||||||||||||
|
I have a use case where maintaining the order of inputs document would make a big difference Suppose the collection "roads" whose elements refer to the elements of the collection "points". It is a 1:N relationship implemented via the array property "pointIds" on the collection "roads". This because the points of the roads are ordered (from the beginning to the end of the road). Just to clarify, the ordering of the "points" is purely maintained by the array, i.e. there are no properties of "points" that allow to deduct their order. An aggregation pipeline on roads that uses $lookup to retrieve the list of points cannot use $group because it would lose the information about the ordering. | |||||||||||||||||||||||
| Comment by Billy Tetrud [ 27/Sep/16 ] | |||||||||||||||||||||||
|
Asya, that's super helpful info. Thanks! | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 27/Sep/16 ] | |||||||||||||||||||||||
|
fresheneesz you can do this in upcoming 3.4 without any unwind and group using the new $reduce expression. This could be the $project stage to extract the field by which you want to sort via $sort:{"sortDate":1}:
In your sample document it creates new field "sortDate" which is the largest "date" value in the array element where field is "b". You can do it in the current 3.2 version via $filter and $max operators in $project stage as well and then sort by "sortDate.date":
| |||||||||||||||||||||||
| Comment by Billy Tetrud [ 05/Jul/16 ] | |||||||||||||||||||||||
|
Is that discussion somewhere I can view? Why wouldn't my requested behavior make sense? "if your application requires some ordering of documents, you would specify that somehow, e.g. via a $sort" Yes, and if you $sort something then use a $group command, the sort is invalidated. This doesn't make sense, as the output of all the other commands are respected in the aggregation pipeline. "Could you accomplish the ordering you need by adding a $sort stage after the $group?" No. Here's my use case. I have documents with nested object representing its history like this:
And I want to sort the documents by the last modified date of a certain field. So lets say that field is b, I want to sort by the date 142983476. In order to do this, I need to unwind the history, project it in a certain way to expose only the dates I want, then sort by those date, then group by _id. But if grouping doesn't respect sort order, this won't work. As a work around, I needed to build a much more complicated (and expensive) first projection stage, then after grouping I need an extra projection stage, then I can sort as the last stage. This was much harder to reason about and is more expensive to run. | |||||||||||||||||||||||
| Comment by Charlie Swanson [ 05/Jul/16 ] | |||||||||||||||||||||||
|
fresheneesz, after some discussion, we're not sure the notion of ordering by the sequence of input documents makes much sense. Generally, if your application requires some ordering of documents, you would specify that somehow, e.g. via a $sort. Can you elaborate on exactly why this behavior would be useful? Could you accomplish the ordering you need by adding a $sort stage after the $group? I'll note that this may be related to SERVER-4507, which will in some cases allow the $group stage to take advantage of sorted sequences, preserving the order. | |||||||||||||||||||||||
| Comment by Billy Tetrud [ 28/Jun/16 ] | |||||||||||||||||||||||
|
I suppose I've been working with Javascript too long and assumed that mongo mapping would work like Javascript mapping. But of course MongoDB is written in C.. Anyways, yes that's exactly what I mean. But the overhead should be pretty insignificant I would think. | |||||||||||||||||||||||
| Comment by Charlie Swanson [ 27/Jun/16 ] | |||||||||||||||||||||||
|
Hi fresheneesz, Just to clarify, you're looking for the order of output documents to match the order of input documents? The reason the order is not preserved is that we keep track of all groups with a map from the _id to the accumulated value, so that we can quickly look up which group an incoming document belongs to. For exemple, with the group stage
And inputs given in this order:
You would expect the following output?
This is definitely possible, but will require some extra overhead to keep track of the order of the groups. |