[SERVER-16281] $setUnion does not descend into nested arrays Created: 21/Nov/14  Updated: 30/May/19  Resolved: 22/Nov/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.6.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: BatScream [X] Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-31991 Allow n-ary aggregation expressions t... Backlog
Operating System: ALL
Steps To Reproduce:

Sample data:

db.s.insert([{"a":1}, {"a":1}, {"a":2}, {"a":2}])

db.s.aggregate([
{$group:{"_id":"$a","s":{$push:"$a"}}},
{$group:{"_id":null,"ds":{$push:"$s"}}},
{$project:{"out":{$setUnion:"$ds"}}}])

The expectation is:`out` to be an array [1,2]. But it does not perform any computation in these cases and returns the "ds" array as such.

Participants:

 Description   

The $setUnion operator doesn't work as stated in the docs. It doesn't work when its input is a variable which is an array of arrays.



 Comments   
Comment by BatScream [X] [ 21/Feb/15 ]

Hi Stephen,

Thanks for the Explanation. I understand that this feature works as expected. But taking into consideration the need of the $setUnion operator to evaluate a variable into an array of arrays, you could add this to a list of "requested enhancements".

Taking this question(http://stackoverflow.com/questions/28637992/group-documents-and-merge-subdocuments-arrays/28638229#28638229) as an example, where i would like to get a set of sizes per city.

[{
"city": "NY",
"size": [1,2,3,4,5]
},
{
"city": "NY",
"size": [1,2,8]
}]

It would be great if the project statement works :

db.col.aggregate([
{$group:{"_id":"$city","size":{$push:"$size"}}},
{$project:{"size":{$setUnion:"$size"}}}
])

to get the output as:

{
"_id": "NY",
"size": [1,2,3,4,5,8]
}

But currently it gives the output:

{
"_id": "NY",
"size": [[1,2,3,4,5][1,2,8]] // not implicitly treated as an array of arrays.
}

I understand that there are other ways to implement this, but this feature would decrease a lot of effort.

Awaiting your response.

Comment by Stennie Steneker (Inactive) [ 22/Nov/14 ]

Hi Clement,

Since your ds field is an array of arrays, the elements are [1,1] and [2,2]. As noted, the $setUnion operator will not expand the nested elements.

Your aggregation output is the equivalent of:

$setUnion:[
	 [
		[ 1, 1],
		[ 2, 2]
	 ]
]

Rather than the example you are citing/expecting which would be:

$setUnion:[
	 [ 1, 1],
     [ 2, 2]
]

Perhaps this example is clearer:

db.docs.insert({
    // one dimensional arrays
	a: [1,1],   b: [2,2],  c: [2,2],
	// array of arrays
	d: [[1,1]], e:[[2,2]], f: [[2,2]]
});
 
db.docs.aggregate(
	{ $project: {
		array: { $setUnion: ["$a", "$b", "$c"] },
		arrayOfArray: { $setUnion: ["$d", "$e", "$f"] } 
	}}
)

The one dimensional arrays are de-duplicated at the element level; nested arrays are compared for equivalence but not recursively expanded:

{
  "result": [
    {
      "_id": ObjectId("546fff93f3889a8477b41611"),
      "array": [
        1,
        2
      ],
      "arrayOfArray": [
        [
          1,
          1
        ],
        [
          2,
          2
        ]
      ]
    }
  ],
  "ok": 1
}

Regards,
Stephen

Comment by BatScream [X] [ 22/Nov/14 ]

Hi Stephen and Asya,

"given an array of two elements is de-duplicates them.
In your case the elements happen to be arrays."

Yes. So Why is it not considering "ds" as an array of arrays, $setUnion:"$ds". If the syntax should always be $setUnion:[], It should throw an error right?

From the doc:
OP: Result:

{ $setUnion: [ [ "a", "b", "a" ], [ "b", "a" ] ] }

[ "b", "a" ]

So isn't the expected behavior of $setUnion to resolve "$ds" to [[1,1],[2,2]] and give the output as [1,2] ? Am i missing something here?

"If a set contains a nested array element, $setUnion does not descend into the nested array but evaluates the array at top-level."

Sorry to say, this is not applicable to the scenario we are talking about. Kindly consider changing the title of the issue appropriately.

Comment by Stennie Steneker (Inactive) [ 22/Nov/14 ]

Hi Clement,

I would also note that the documented behaviour for $setUnion does cover arrays:

If a set contains a nested array element, $setUnion does not descend into the nested array but evaluates the array at top-level.

I'm guessing your example is contrived to try to illustrate what you were seeing, but in this case a simpler approach to get your expected output would be using $addToSet:

db.s.aggregate([
	{ $group: {
		"_id": null,
		"out": { $addToSet: "$a" }
	}}
])

If you have future support-related questions on working with aggregation or MongoDB, a more appropriate forum would be the mongodb-user discussion group or StackOverflow. MongoDB team members are active in these forums, and you can also benefit from the experience of other MongoDB users.

Regards,
Stephen

Comment by Asya Kamsky [ 22/Nov/14 ]

The operator is working exactly as expected - given an array of two elements it de-duplicates them.

In your case the elements happen to be arrays.

Generated at Thu Feb 08 03:40:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.