[SERVER-14872] Aggregation pipeline project expression operator to concatenate multiple arrays into one Created: 12/Aug/14  Updated: 05/Feb/16  Resolved: 15/Jun/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.1.5

Type: Improvement Priority: Major - P3
Reporter: Jon Rangel (Inactive) Assignee: Charlie Swanson
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-5919 Add set/array operations to $unwind i... Closed
Related
related to CSHARP-1364 Aggregation pipeline project expressi... Closed
related to DRIVERS-234 Aggregation Builder Support for 3.2 Closed
is related to SERVER-17258 Add $reduce expression operator for r... Closed
Backwards Compatibility: Fully Compatible
Sprint: Quint Iteration 4, Quint Iteration 5
Participants:

 Description   

I need to count the number of duplicate IDs that occur across all documents in a collection. However, this very simple aggregation is complicated by the fact that the IDs in question may appear in multiple fields in the input document.

Suppose I have input documents like the one below:

{
  a : [1, 2, 3],
  b : [2, 4, 5]
}

I'd like to transform this to a document that looks like:

{
  c : [1, 2, 3, 2, 4, 5]
}

From this point, it is straightforward to unwind and group on c to get to the desired result.

The above transformation is similar to what would be achieved using the $setUnion operator, but here I do not want to filter out duplicate values.

Therefore, this request is for an operator similar to $setUnion but which does not filter out duplicate values.



 Comments   
Comment by Asya Kamsky [ 09/Sep/15 ]

will evaluate to null if any of the inputs are null

Just highlighting this behavior, and adding a note for documentation that if the quoted behavior is not desired (i.e. if you want null concat with array field to be just equal to array, then you should add $ifNull:

{$project: { newArray:  {$concatArrays: [ {$ifNull: ["$a", [] ] }, {$ifNull: ["$b", [] ] } ] } } }

Above will always output newArray as array type and never as null, even if a and b are both null in the input documents. If one of them is null then the output will be the other array. If you want to guard against error if a or b may not be an array, you could add $isArray operator.

Comment by Charlie Swanson [ 15/Jun/15 ]

The new operator is called $concatArrays. It takes any number of arguments, will evaluate to null if any of the inputs are null, and errors if any arguments are not null or arrays. Nested arrays will not be flattened (i.e {$concatArrays: [[1,2], [3, [4]]]} will evaluate to [1, 2, 3, [4]]).

See the test in the commit for more details.

Comment by Githook User [ 15/Jun/15 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-14872: Aggregation expression to concatenate multiple arrays into one
Branch: master
https://github.com/mongodb/mongo/commit/5717454bc3c50c9ac985dcb77639f2710dfda8e4

Comment by Jon Rangel (Inactive) [ 10/Apr/15 ]

Hi asya - $setIntersection does not help with the use case in the original description since the requirement is to check for duplicate IDs globally across the collection, not just within a single document. $setIntersection would throw away IDs that we actually want to keep for the subsequent $unwind and $group stages.

Agree with proposal for array $concat operator.

Comment by Asya Kamsky [ 08/Apr/15 ]

While the use case described (of finding duplicates) can be handled with set operations, there is a need for a general way of concatenating arrays, so I'd like to use this ticket for that request.

Proposed: {$project:{c:{$concat:["$a","$b"]}}} would combine a and b arrays into array c.

Comment by Asya Kamsky [ 08/Apr/15 ]

Would $setIntersection help - it will basically show duplicates between a and b and $size will show how many are duplicated.

Generated at Thu Feb 08 03:36:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.