[SERVER-11392] $unwind on subdocuments Created: 25/Oct/13 Updated: 07/Apr/23 Resolved: 23/Mar/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 2.4.7 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Aaron Westendorf | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 9 |
| Labels: | expression, stage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
$unwind, or a new operator, should emit new records for each key-value pair in a subdocument. A good example of where this would be useful is if a document contained a histogram (e.g. counts of unique account ids), and one wanted to use the aggregation framework to calculate the number of unique keys in that histogram (e.g. number of unique accounts, i.e. the length of the subdocument). Additional description from
And a request, also from
|
| Comments |
| Comment by Charlie Swanson [ 23/Mar/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi all, I'm closing this as a duplicate of If you have a use case that would not be met by these new operators, feel free to re-open, and clarify what you would need to accomplish your desired behavior. | ||||||||||||||||||||||||||||||||||
| Comment by Robert Beekman [ 09/Dec/15 ] | ||||||||||||||||||||||||||||||||||
|
This would be a great addition to the aggregation framework. A lot of examples involving MongoDB and analytics use the $inc operator e.g. {'$inc' => {'metric_name' => 1}}, but you can only process these values in a Map/Reduce right now and not in an aggregation. Basically you can either use $inc to increment/insert a lot of data fast and easily and must use map/reduce to aggregate this data, or use arrays / insert separate documents and do aggregations. | ||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 11/Nov/15 ] | ||||||||||||||||||||||||||||||||||
|
See also | ||||||||||||||||||||||||||||||||||
| Comment by John A. De Goes [ 07/Nov/15 ] | ||||||||||||||||||||||||||||||||||
|
By the way, I should mention an entirely different approach, which is providing an operator to gather key/value pairs of a subdocument into an array. For example:
You can see this is easily equivalent, because from this state you can $unwind the array. However, as historically Mongo's planner has not been very smart, I'd say this will probably be less efficient than just adding a special-case $unravel type operator to do everything in one fell swoop. | ||||||||||||||||||||||||||||||||||
| Comment by John A. De Goes [ 07/Nov/15 ] | ||||||||||||||||||||||||||||||||||
|
charlie.swanson The use cases all involve analytics on the schema. As you probably know, the keys inside MongoDB documents are often used as keys in a map for purposes of real-time aggregation. Often these keys are dynamic and come from user-generated data or things like bucketed timestamps. Since this is actual data and it's being stored in the "schema", it's imperative that it is possible to process it dynamically with the aggregation pipeline, which in turn necessitates an operator capable of unwinding the key/value pairs of a document so they can be grouped, summed, and so forth. I'd propose that this operator (sometimes called $unravel) be able to unwind the key/value pairs of a document in the same way the $unwind operator can unwind the index/value pairs of an array. For this operator, one level is fine; the more general case of arbitrarily deep unwinding is not even possible for $unwind, but could be dealt with separately in new operators when the need arises $unwindAll / $unravelAll.
If you like you could make it more like $unwind and omit the key by default and give user choice of the "key field name", or you could shorten the names of the key/value fields to "k" and "v", or whatever. But this is the basic functionality needed here. Note that a request for this functionality comes up all the time on StackOverflow and mongodb-user, albeit most users do not know that a proper solution to ticket will actually solve the problem they have. | ||||||||||||||||||||||||||||||||||
| Comment by Sylvain Leroux [X] [ 16/Jun/15 ] | ||||||||||||||||||||||||||||||||||
|
I don't have the required permissions to access https://jira.mongodb.org/browse/PARTNER-214, so I comment here. First, sorry charlie.swanson for the late answer, but I took time to think more about that feature, and as you will see in the following example, I've slightly changed my mind about the syntax/semantic. Next, here is a concrete use case – given a collection tracking the results for some sport or game event:
I won't argue if this is a good design or not. But, at the very least, it is fast and easy to add/update the result for a team for some event. A simple $set will do the trick. Using an array here, I think it would be more tricky to avoid duplicates and to deal with concurrent updates of the results of a different team for the same event. Anyway, given this design, I cannot find the average score for a given event. Nor the number of participating teams. One might argue that should be done at application-level. I would agree for one event. But what if I need for find the average/lowest/highest score for the year 2015 ? Transferring a potentially high number of documents to the client just for that seems not to be optimal. What I have in mind now would be something like that (please note, unlike what I said in a previous comment, I now think that mapping a document to an array of key/value document is a better idea, as it would allow easy access to the value):
I imagine an intermediate collection like that after the $project step:
| ||||||||||||||||||||||||||||||||||
| Comment by Mark Hall [ 16/Jun/15 ] | ||||||||||||||||||||||||||||||||||
|
The approach to the problem that we were trying to solve has moved away from the agg pipeline - see https://jira.mongodb.org/browse/PARTNER-214. | ||||||||||||||||||||||||||||||||||
| Comment by Rod Adams [ 12/Jun/15 ] | ||||||||||||||||||||||||||||||||||
|
I don't believe the use case I referenced needed recursive behavior, but please reference the description in | ||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 12/Jun/15 ] | ||||||||||||||||||||||||||||||||||
|
aaron.westendorf or Sylvain Leroux, can you please give a concrete example of why this would be useful? Why do you need to do this? Maybe some sample data with names other than a, b, c would make it more clear. rod.adams, is the recursive behavior important to you? Or is it just to save on some extra stages? Can you please also give an example of what this would enable you to do? mhall, would you be able to describe your use case? I imagine that some sort of $docUnwind (or whatever it ends up being called) combined with a resolution of For all of you, would any of your use cases be solved if To be clear, I think this definitely has potential uses, but it seems that each of you are looking for slightly different things. From what I can tell, each problem seems similar and could potentially be solved by introducing a single general operator, be it an aggregation stage or an expression to use in $project or $group. I'd like to work out what that might look like. | ||||||||||||||||||||||||||||||||||
| Comment by Sylvain Leroux [X] [ 06/Jun/15 ] | ||||||||||||||||||||||||||||||||||
|
A great feature to have. But I'm not quite sure of the function semantic. Given the first example, it will have to generate new field names. Is there any other function doing that ? At the very least, `$unwind` don't do it as it reuses the existing field name. More important, still in the first example, why `$docUnwind` would recursively unwind the embedded documents ? Once again, this is the opposite of the current behavior of `$unwind`. Or am I wrong ? After a second though, `$unwind` has to be implemented as a separate pipeline stage as it will map one document to 0-to-n documents. But this not necessarily the case here: I wonder why this sort of operation couldn't be implemented instead as an projection operator, mapping an object to a key-value array. Something like that:
Letting the recursive behavior aside from the discussion, you only have to add an extra plain `$unwind` then `$project` stage to get back your original result. As an added benefit, it seems to me the semantic for the complementary operation would now be straightforward:
I must concede that both are not quite symmetric for root documents. But it should be for embedded documents. | ||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 27/Sep/14 ] | ||||||||||||||||||||||||||||||||||
|
|