[SERVER-23839] Aggregation accumulators does not allow the new "direct" array notation Created: 21/Apr/16  Updated: 06/Dec/22  Resolved: 30/May/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Neil Lunn Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-16284 Allow for $pushall / $push with $each... Open
Assigned Teams:
Query
Operating System: ALL
Steps To Reproduce:

Insert some data

    db.test.insert({ "a": 1, "b": 2 })

Use the array expression notation to result in an array of arrays after $push

    db.test.aggregate([
       { "$group": {
           "_id": null,
           "data": { "$push": ["$a","$b"] }
       }}
   ])

Produces an error, but $project and other operators allow it

    db.test.aggregate([
       { "$project": {
           "data": ["$a","$b"]
       }}
   ])

But not accumulators of any kind;

    db.test.aggregate([
       { "$group": {
           "_id": null,
           "data": { "$first": ["$a","$b"] }
       }}
   ])

This produces the same error due to the argument being in array notation:

"aggregating group operators are unary"

So in order to get an "array" you still need $map:

    db.test.aggregate([
       { "$group": {
           "_id": null,
           "data": { 
             "$first": {
               "$map": {
                 "input": ["A","B"],
                 "as": "el",
                 "in": {
                   "$cond": [
                     { "$eq": [ "$$el", "A" ] },
                     "$a",
                     "$b"
                   ]
                }
             }
           }
       }}
   ])

Which yields an array in response, but since it is not directly notated as such the expression is allowed.

Desired output from $push

    { "_id": null, "data": [[1,2]] }

Participants:

 Description   

MongoDB 3.2 allows direct notation of arrays as an expression, and is usually fine in most constructs. ie.

 
    db.test.aggregate([
       { "$project": {
           "data": ["$a","$b"]
       }}
   ])

However, when you attempt this notation as an argument to $push you get an error:

 
    db.test.aggregate([
       { "$group": {
           "_id": null,
           "data": { "$push": ["$a","$b"] }
       }}
   ])

So the case still requires transposition of elements using $map as an argument to $push in order to get the desired result:

 
    db.test.aggregate([
       { "$group": {
           "_id": null,
           "data": { 
             "$push": {
               "$map": {
                 "input": ["A","B"],
                 "as": "el",
                 "in": {
                   "$cond": [
                     { "$eq": [ "$$el", "A" ] },
                     "$a",
                     "$b"
                   ]
                }
             }
           }
       }}
   ])

Where $push is actually happy that the expression can be evaluated, even though the returned element is an array.

Though the error itself is self explanatory and there are cases where this would avert undesired output, it is not really consistent with the usage in other constructs.

It might therefore be desirable to just allow the array expression to be applied, but then only possibly error for things like $sum, or maybe even in that case just produce the 0 return value.

So overall unsure if it is best to error in such a case or just produce what may be unexpected results if the usage was not intended that way. For cases where the intended result was indeed array content to be sent to the accumulator, then there is an inconsistency with usage in other places.



 Comments   
Comment by Asya Kamsky [ 30/May/19 ]

It seems like in spirit this is similar (or flip side) of problem described in SERVER-16284 so I'm closing it as duplicate of that ticket.

Comment by Charlie Swanson [ 02/May/16 ]

So just to confirm, with something like the proposed syntax in SERVER-9377, you would then be directly notating an array since that accumulator accepts two arguments?:

 
    db.test.aggregate([
        { "$group": {
            "_id": null,
            "data": { "$pushFirstN": [ ["$x","$y"], 10 ] }
       }}

Yes, this would specify an array literal with two elements. The other example (wrapping ["$x", "$y"] in another set of brackets), would be an array literal with one element, which is an array.

I'm not sure I'm following your latest comment. Would you mind elaborating on the feature request?

Comment by Neil Lunn [ 24/Apr/16 ]

Then again, considering something like:

 
db.test.aggregate([
  { "$group": {
    "_id": null,
    "data": { 
      "$push": { 
        "$setUnion": [ 
          [{ "a": "$a", "b": "$b" }],
          [{ "a": "$b", "b": "$a" }]
        ]
      }
    }
  }}
])

Makes sense in that syntax and result, but that does bring a feature request to mind as well.

Comment by Neil Lunn [ 24/Apr/16 ]

I can live with that. The notation makes logical sense as long as it's consistent. So

 
    db.test.aggregate([
        { "$group": {
            "_id": null,
           "data": { "$push": [["$x","$y"]] }
         }}
    ])

Actually has a nice feel to it.

So just to confirm, with something like the proposed syntax in SERVER-9377, you would then be directly notating an array since that accumulator accepts two arguments?:

 
    db.test.aggregate([
        { "$group": {
            "_id": null,
            "data": { "$pushFirstN": [ ["$x","$y"], 10 ] }
       }}

Or would it still make more sense to include the explicit notation?:

 
    db.test.aggregate([
        { "$group": {
            "_id": null,
            "data": { "$pushFirstN": [ [["$x","$y"]], 10 ] }
       }}

I've been watching SERVER-9377 since it does address some specific problem cases not currently addressed in single aggregation statements. So it would be nice to get that right, and handle expressions consistently. I actually think the latter makes more sense here and would keep it consistent with usage in other expressions.

Comment by Charlie Swanson [ 21/Apr/16 ]

Hi neil@mylunn.id.au,

This is a legitimate request, but we will have to be careful about it. As proposed in SERVER-9377, we're going to be adding accumulators that take more than one argument. At least in those accumulators, we will have to be careful to make sure it is not ambiguous what is meant by an array literal. Is it multiple arguments, or is it one argument, which is an array?

The way this works for expression is somewhat weird, but I find it makes sense if you think about it. For expressions which take only one argument, you are allowed to omit wrapping the one argument in an array literal. For those that take more than one argument, you must specify the arguments in an array. This leads to this somewhat odd behavior:

> db.foo.aggregate([{$project: {x: {$size: [[1, 2]]}}}])
{ "_id" : ObjectId("5717d5b230b720f2cabdf0ab"), "x" : 2 }  // The size of the array [1, 2].
 
> db.foo.aggregate([{$project: {x: {$size: [1, 2]}}}])  // Interpreted as two arguments.
assert: command failed: {
	"ok" : 0,
	"errmsg" : "Expression $size takes exactly 1 arguments. 2 were passed in.",
	"code" : 16020
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:23:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1215:5
@(shell):1:1
 
 
> db.foo.aggregate([{$project: {_id: 0, x: {$size: [[1, 2], [1, 2]]}}}])  // Still interpreted as two arguments.
assert: command failed: {
	"ok" : 0,
	"errmsg" : "Expression $size takes exactly 1 arguments. 2 were passed in.",
	"code" : 16020
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:23:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1215:5
@(shell):1:1
> db.foo.aggregate([{$project: {_id: 0, x: {$size: [[[1, 2], [1, 2]]]}}}])  // When wrapped in an array, it's treated as one argument.
{ "x" : 2 }  // Size of [[1,2], [1,2]].

We might have to / want to do the same thing for accumulators, so that if you actually wanted an array, you'd have to wrap it in another array.

Generated at Thu Feb 08 04:04:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.