[SERVER-14872] Aggregation pipeline project expression operator to concatenate multiple arrays into one Created: 12/Aug/14 Updated: 05/Feb/16 Resolved: 15/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.5 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jon Rangel (Inactive) | Assignee: | Charlie Swanson |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Sprint: | Quint Iteration 4, Quint Iteration 5 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
I need to count the number of duplicate IDs that occur across all documents in a collection. However, this very simple aggregation is complicated by the fact that the IDs in question may appear in multiple fields in the input document. Suppose I have input documents like the one below:
I'd like to transform this to a document that looks like:
From this point, it is straightforward to unwind and group on c to get to the desired result. The above transformation is similar to what would be achieved using the $setUnion operator, but here I do not want to filter out duplicate values. Therefore, this request is for an operator similar to $setUnion but which does not filter out duplicate values. |
| Comments |
| Comment by Asya Kamsky [ 09/Sep/15 ] | |
Just highlighting this behavior, and adding a note for documentation that if the quoted behavior is not desired (i.e. if you want null concat with array field to be just equal to array, then you should add $ifNull:
Above will always output newArray as array type and never as null, even if a and b are both null in the input documents. If one of them is null then the output will be the other array. If you want to guard against error if a or b may not be an array, you could add $isArray operator. | |
| Comment by Charlie Swanson [ 15/Jun/15 ] | |
|
The new operator is called $concatArrays. It takes any number of arguments, will evaluate to null if any of the inputs are null, and errors if any arguments are not null or arrays. Nested arrays will not be flattened (i.e {$concatArrays: [[1,2], [3, [4]]]} will evaluate to [1, 2, 3, [4]]). See the test in the commit for more details. | |
| Comment by Githook User [ 15/Jun/15 ] | |
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: | |
| Comment by Jon Rangel (Inactive) [ 10/Apr/15 ] | |
|
Hi asya - $setIntersection does not help with the use case in the original description since the requirement is to check for duplicate IDs globally across the collection, not just within a single document. $setIntersection would throw away IDs that we actually want to keep for the subsequent $unwind and $group stages. Agree with proposal for array $concat operator. | |
| Comment by Asya Kamsky [ 08/Apr/15 ] | |
|
While the use case described (of finding duplicates) can be handled with set operations, there is a need for a general way of concatenating arrays, so I'd like to use this ticket for that request. Proposed: {$project:{c:{$concat:["$a","$b"]}}} would combine a and b arrays into array c. | |
| Comment by Asya Kamsky [ 08/Apr/15 ] | |
|
Would $setIntersection help - it will basically show duplicates between a and b and $size will show how many are duplicated. |