[SERVER-11392] $unwind on subdocuments Created: 25/Oct/13  Updated: 07/Apr/23  Resolved: 23/Mar/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.4.7
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Aaron Westendorf Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 9
Labels: expression, stage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-18794 Add an aggregation expression to conv... Closed
is duplicated by SERVER-15175 Add a way to "unwind" fields in a doc... Closed
is duplicated by SERVER-17709 New aggregation operator - $unravel Closed
Related
related to SERVER-5947 Add ability to project key names as v... Closed
Assigned Teams:
Query
Participants:

 Description   

$unwind, or a new operator, should emit new records for each key-value pair in a subdocument. A good example of where this would be useful is if a document contained a histogram (e.g. counts of unique account ids), and one wanted to use the aggregation framework to calculate the number of unique keys in that histogram (e.g. number of unique accounts, i.e. the length of the subdocument).

Additional description from SERVER-15175:

Example function:

db.coll.insert({id: 'foo', a: 100, b: 200, c: 300, d: {e: 400, f:500}})
db.coll.aggregate([{$docUnwind: {id: 0}},  // syntax similar to $project for whitelisting or blacklisting which fields to unwind
])
{id: 'foo', f_name: 'a', f_value: 100}
{id: 'foo', f_name: 'b', f_value: 200}
{id: 'foo', f_name: 'c', f_value: 300}
{id: 'foo', f_name: 'd.e', f_value: 400}
{id: 'foo', f_name: 'd.f', f_value: 500}

In short, it will take the fields in a document, and unwind a portion of them into name, value pairs, making a separate document for each.

One use case for this is collecting statistics across a dynamic list of fields.

And a request, also from SERVER-15175:

It is critical (for our use case) to have the type of the field (BSON id) in the output as well - e.g.

{id: 'foo', f_name:  'a', f_type: 1, f_value: 100}

Some examples for arrays:

db.coll.insert({id: 'foo', a: [100, 200], b : [{c: 300, f: 'fred'}]})
db.coll.aggregate([{$docUnwind: {id: null}}, ...])  // process all documents like $group can
{id: 'foo', f_name: 'a[]', f_type: 1} // the [] in the name tells us that this is a terminal array of primitives; value omitted in this case perhaps?
{id: 'foo', f_name: 'b[].c', f_type: 1, f_value: 300}
{id: 'foo', f_name: 'b[].f',  f_type: 2, f_value: 'fred'}



 Comments   
Comment by Charlie Swanson [ 23/Mar/16 ]

Hi all,

I'm closing this as a duplicate of SERVER-18794, since I believe that will capture all use cases described here. It may be that some use cases will need to use other expressions/stages as well, such as those proposed in SERVER-23313 or SERVER-23310.

If you have a use case that would not be met by these new operators, feel free to re-open, and clarify what you would need to accomplish your desired behavior.

Comment by Robert Beekman [ 09/Dec/15 ]

This would be a great addition to the aggregation framework. A lot of examples involving MongoDB and analytics use the $inc operator e.g. {'$inc' => {'metric_name' => 1}}, but you can only process these values in a Map/Reduce right now and not in an aggregation.

Basically you can either use $inc to increment/insert a lot of data fast and easily and must use map/reduce to aggregate this data, or use arrays / insert separate documents and do aggregations.

Comment by Ramon Fernandez Marina [ 11/Nov/15 ]

See also SERVER-17709 for a similar proposal, but using a new operator to avoid any possible confusion on $unwind.

Comment by John A. De Goes [ 07/Nov/15 ]

By the way, I should mention an entirely different approach, which is providing an operator to gather key/value pairs of a subdocument into an array.

For example:

{"tags": {"nosql": 123, "analytics": 2392, "mongodb": 2}}
 
-->
 
{$gather: "$tags"}
 
-->
 
{"tags": [{"key": "nosql", "value": 123}, {"key": "analytics", "value": 2392}, {"key": "mongodb", "value": 2}]}

You can see this is easily equivalent, because from this state you can $unwind the array. However, as historically Mongo's planner has not been very smart, I'd say this will probably be less efficient than just adding a special-case $unravel type operator to do everything in one fell swoop.

Comment by John A. De Goes [ 07/Nov/15 ]

charlie.swanson The use cases all involve analytics on the schema. As you probably know, the keys inside MongoDB documents are often used as keys in a map for purposes of real-time aggregation. Often these keys are dynamic and come from user-generated data or things like bucketed timestamps. Since this is actual data and it's being stored in the "schema", it's imperative that it is possible to process it dynamically with the aggregation pipeline, which in turn necessitates an operator capable of unwinding the key/value pairs of a document so they can be grouped, summed, and so forth.

I'd propose that this operator (sometimes called $unravel) be able to unwind the key/value pairs of a document in the same way the $unwind operator can unwind the index/value pairs of an array. For this operator, one level is fine; the more general case of arbitrarily deep unwinding is not even possible for $unwind, but could be dealt with separately in new operators when the need arises $unwindAll / $unravelAll.

{"tags": {"nosql": 123, "analytics": 2392, "mongodb": 2}}
 
-->
 
{$unravel: "$tags"}
 
-->
 
{"tags": {"key": "nosql", "value": 123}
{"tags": {"key": "analytics", "value": 2392}
{"tags": {"key": "mongodb", "value": 2}

If you like you could make it more like $unwind and omit the key by default and give user choice of the "key field name", or you could shorten the names of the key/value fields to "k" and "v", or whatever. But this is the basic functionality needed here.

Note that a request for this functionality comes up all the time on StackOverflow and mongodb-user, albeit most users do not know that a proper solution to ticket will actually solve the problem they have.

Comment by Sylvain Leroux [X] [ 16/Jun/15 ]

I don't have the required permissions to access https://jira.mongodb.org/browse/PARTNER-214, so I comment here.

First, sorry charlie.swanson for the late answer, but I took time to think more about that feature, and as you will see in the following example, I've slightly changed my mind about the syntax/semantic.

Next, here is a concrete use case – given a collection tracking the results for some sport or game event:

{
 _id: 1,
  event: "World Cup",
  year: 2015,
  results: {
    "The Sharks": 32,
    "Red Team": 10,
    "Kangaroos": 48,
  }
},
{
 _id: 2
  event: "European Games",
  year: 2015,
  results: {
    "Red Team": 24,
    "Blue Team": 12,
  }
}

I won't argue if this is a good design or not. But, at the very least, it is fast and easy to add/update the result for a team for some event. A simple $set will do the trick. Using an array here, I think it would be more tricky to avoid duplicates and to deal with concurrent updates of the results of a different team for the same event.

Anyway, given this design, I cannot find the average score for a given event. Nor the number of participating teams. One might argue that should be done at application-level. I would agree for one event. But what if I need for find the average/lowest/highest score for the year 2015 ? Transferring a potentially high number of documents to the client just for that seems not to be optimal.

What I have in mind now would be something like that (please note, unlike what I said in a previous comment, I now think that mapping a document to an array of key/value document is a better idea, as it would allow easy access to the value):

db.collection.aggregate([
  {$match: { event:  "World Cup", year: "2015" }},
  {$project: { results: { $itemize: "$results" }}},
  //                               ^^^^^^^^^^^^^^^^^
  {$unwind: { results: "$results" }},
  {$group: {_id: "$_id", total_score: {$sum: "$results.value"}, team_count: {$sum: 1}}}
])

I imagine an intermediate collection like that after the $project step:

{
 _id: 1,
 results: [
    {key: "The Sharks", value: 32},
    {key: "Red Team", value: 10},
    {key: "Kangaroos", value: 48},
  ]
}

Comment by Mark Hall [ 16/Jun/15 ]

The approach to the problem that we were trying to solve has moved away from the agg pipeline - see https://jira.mongodb.org/browse/PARTNER-214.

Comment by Rod Adams [ 12/Jun/15 ]

I don't believe the use case I referenced needed recursive behavior, but please reference the description in SERVER-15175 for an example.

Comment by Charlie Swanson [ 12/Jun/15 ]

aaron.westendorf or Sylvain Leroux, can you please give a concrete example of why this would be useful? Why do you need to do this? Maybe some sample data with names other than a, b, c would make it more clear.

rod.adams, is the recursive behavior important to you? Or is it just to save on some extra stages? Can you please also give an example of what this would enable you to do?

mhall, would you be able to describe your use case? I imagine that some sort of $docUnwind (or whatever it ends up being called) combined with a resolution of SERVER-13447 would be enough?

For all of you, would any of your use cases be solved if SERVER-5947 were resolved?

To be clear, I think this definitely has potential uses, but it seems that each of you are looking for slightly different things. From what I can tell, each problem seems similar and could potentially be solved by introducing a single general operator, be it an aggregation stage or an expression to use in $project or $group. I'd like to work out what that might look like.

Comment by Sylvain Leroux [X] [ 06/Jun/15 ]

A great feature to have. But I'm not quite sure of the function semantic. Given the first example, it will have to generate new field names. Is there any other function doing that ? At the very least, `$unwind` don't do it as it reuses the existing field name.

More important, still in the first example, why `$docUnwind` would recursively unwind the embedded documents ? Once again, this is the opposite of the current behavior of `$unwind`. Or am I wrong ?

After a second though, `$unwind` has to be implemented as a separate pipeline stage as it will map one document to 0-to-n documents. But this not necessarily the case here: I wonder why this sort of operation couldn't be implemented instead as an projection operator, mapping an object to a key-value array. Something like that:

db.coll.insert({id: 'foo', a: 100, b: 200, c: 300, d: {e: 400, f:500}})
db.coll.aggregate({$project: { data: { $itemize: { id: 0 } } } })
{ data: [['a', 100], ['b', 200], ['c', 300], ['d', {e: 400, f:500}]] }

Letting the recursive behavior aside from the discussion, you only have to add an extra plain `$unwind` then `$project` stage to get back your original result. As an added benefit, it seems to me the semantic for the complementary operation would now be straightforward:

db.coll.insert({data: [['a', 100],['b', 200],['c', 300]]})
db.coll.aggregate({$project: { obj: { $deitemize: "$data" } })
{ obj: { a: 100, b: 200, c: 300}}

I must concede that both are not quite symmetric for root documents. But it should be for embedded documents.

Comment by Asya Kamsky [ 27/Sep/14 ]

SERVER-5947 seems relevant/related.

Generated at Thu Feb 08 03:25:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.