[SERVER-9625] Support many accumulators in aggregation framework as expressions Created: 08/May/13  Updated: 28/Oct/15  Resolved: 11/Aug/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.4.3
Fix Version/s: 3.1.7

Type: Improvement Priority: Major - P3
Reporter: Samuel Chou Assignee: James Cohan
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-10626 Support numeric arrays as input to $a... Closed
is duplicated by SERVER-14359 Allow $max, $min, $avg calculations ... Closed
is duplicated by SERVER-19166 $stdDev should be available as an exp... Closed
is duplicated by SERVER-14565 New aggregation operators Closed
Related
related to CSHARP-1373 Support many accumulators in aggregat... Closed
is related to DRIVERS-234 Aggregation Builder Support for 3.2 Closed
Backwards Compatibility: Fully Compatible
Sprint: Quint Iteration 7, QuInt 8 08/28/15
Participants:

 Description   

Make the following accumulators available as expressions: $stdDevSamp, $stdDevPop, $avg, $min, $max, and $sum.

As expressions, available in the $project stage, these operators can take either:

  • A single argument: <accumulator> : <arg>
  • Multiple arguments: <accumulator> : [ <arg1>, <arg2>, ... ]
Single Argument: <arg>

If the argument is an array, the accumulator operates on the elements of the array to return a single value. If the argument is not an array, the accumulator operates on the single argument.

Multiple arguments: [ <arg1>, <arg2>, ... ]

The accumulator operates on the arguments to return a single value.

If any of the argments is an array, the accumulator does not traverse into the array elements.

Non-numerical arguments

When used as part of the $project stage, $sum, $avg, $stdDevSamp, and $stdDevPop ignore non-numerical arguments.

Original request:

{'$group': {
    '_id': {
        'grp_0': '$ses.usr',
        'grp_1': '$ses._id'
    },
    'agg_0': {'$sum': {'$subtract': [
        {'$cond': [{'$lt': ['$ses.e', E]},
                   '$ses.e',
                   E]},
        {'$cond': [{'$gt': ['$ses.s', S]},
                   '$ses.s',
                   S]}
        ]}
     }
}}

you can write:

{'$group':
  {
    '_id': {
      'grp_0': '$ses.usr',
      'grp_1': '$ses._id'
    },
    'agg_0': {'$sum': {'$subtract': [
      {'$min': ['$ses.e', E]}, 
      {'$max': ['$ses.s', S]}
     ]}
    }
  }
}



 Comments   
Comment by Githook User [ 14/Aug/15 ]

Author:

{u'username': u'jamesfcohan', u'name': u'James Cohan', u'email': u'james.cohan@10gen.com'}

Message: SERVER-9625 Logging arguments of failed test case in expression unit tests
Branch: master
https://github.com/mongodb/mongo/commit/e73b7d634c66e73f80a4ba1521354de00a0db885

Comment by Charlie Swanson [ 14/Aug/15 ]

cc jeff.yemin, we might need driver changes for this.

Comment by Githook User [ 11/Aug/15 ]

Author:

{u'username': u'jamesfcohan', u'name': u'James Cohan', u'email': u'james.cohan@10gen.com'}

Message: SERVER-9625 Makes $sum, $avg, $min, $max, $stdDevPop, and $stdDevSamp accumulators available as expressions
Branch: master
https://github.com/mongodb/mongo/commit/f9828f5a0312801fa5b8592b31648a64f0ea7f67

Comment by Charlie Swanson [ 20/May/15 ]

Expanding the scope of this to include $avg as well.

Comment by Mathias Stearn [ 09/May/13 ]

You can simplify your code a bit by defining these helper functions in python:

def agg_min(expr1, expr2):
    return {'$cond': [{'$lt': [expr1, expr2]}, # if expr1 < expr2
                      expr1, # then expr1
                      expr2]} # else expr2
 
def agg_max(expr1, expr2):
    return {'$cond': [{'$gt': [expr1, expr2]}, # if expr1 > expr2
                      expr1, # then expr1
                      expr2]} # else expr2

Comment by Samuel Chou [ 08/May/13 ]

Hi Andy,

Thanks. Your workaround works for us, but for sure we'd love this to still be implemented. An example of an actual query we run (I simplified the original example):

queries = self.db.SummaryLogs.aggregate([
  match,
  {'$unwind': '$cat'},
  match,
  {'$group':
    {
      '_id': {
        'grp_0': '$cat',
        'grp_1': '$dom',
        'grp_2': '$ses.usr'
      },
      'agg_0': {'$sum': '$sz'},
      'agg_1': {'$sum': 1},
      # https://jira.mongodb.org/browse/SERVER-9625 - $min and $max to make this cleaner eventually
      'agg_2': {'$sum':
        {'$cond': 
          [{'$gt': [
              {'$subtract': [
                {'$cond': [{'$lt': ['$ses.e', self.options['end']]}, '$ses.e', self.options['end']]},
                {'$cond': [{'$gt': ['$ses.s', self.options['start']]}, '$ses.s', self.options['start']]}
              ]},
              0
            ]},
          {'$subtract': [
            {'$cond': [{'$lt': ['$ses.e', self.options['end']]}, '$ses.e', self.options['end']]},
            {'$cond': [{'$gt': ['$ses.s', self.options['start']]}, '$ses.s', self.options['start']]}
          ]},
          0
        ]}
      }
    }
  }
])

Comment by Andy Schwerin [ 08/May/13 ]

The problem you're actually experiencing is that "$min" and "$max" only act as accumulators, not as document expression operators. It would be a good feature to support $min and $max in document expressions.

For the time being, the following should do what you're looking for.

{'$group': {
    '_id': {
        'grp_0': '$ses.usr',
        'grp_1': '$ses._id'
    },
    'agg_0': {'$sum': {'$subtract': [
        {'$cond': [{'$lt': ['$ses.e', end]},
                   '$ses.e',
                   end]},
        {'$cond': [{'$gt': ['$ses.s', start]},
                   '$ses.s',
                   start]}
        ]}
     }
}}

Generated at Thu Feb 08 03:20:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.