[SERVER-17258] Add $reduce expression operator for rollup of arrays Created: 11/Feb/15  Updated: 13/Dec/17  Resolved: 21/Apr/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.3.5

Type: New Feature Priority: Major - P3
Reporter: Jon Rangel (Inactive) Assignee: Benjamin Murphy
Resolution: Done Votes: 9
Labels: expression, neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by CSHARP-1662 Add $reduce expression operator for r... Closed
Documented
is documented by DOCS-8580 Create 3.4 Manual Page for $reduce Closed
Related
related to SERVER-15697 Create a string accumulator for aggre... Backlog
related to SERVER-14359 Allow $max, $min, $avg calculations ... Closed
related to SERVER-10682 Expose $multiply as both an accumulat... Backlog
related to SERVER-10676 $multiply does not support a referenc... Closed
related to SERVER-14872 Aggregation pipeline project expressi... Closed
is related to DRIVERS-297 Aggregation Framework Support for 3.4 Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 13 (04/22/16)
Participants:
Case:

 Description   

Syntax

{
    $reduce: {
        input: <array>,
        initialValue: <expression>,
        in: <expression>
    }
}

Examples

> db.coll.insert([
  {_id: 1, array_field: [1,2,3]}
  {_id: 2, array_field: []}
]);
> db.coll.aggregate([{
    $project: {
        total: {
            $reduce: {  // Use $reduce to compute the sum.
                input: "$array_field",
                initialValue: 0,
                in: {$add : ["$$value", "$$this"]}
            }
        }
    }
}])
{_id: 1, total: 6}
{_id: 2, total: 0}
 
 
// Example 2
> db.coll.insert([
  {_id: 1, array_field: [1,2,3]}
  {_id: 2, array_field: []}
]);
> db.coll.aggregate([{
  $project: {
    array_totals: {
      $reduce: {  // Compute both the sum and the product.
        input: "$array_field",
        initialValue: {sum: 0, product: 1},
        in: {
          sum: {$add : ["$$value.sum", "$$this"]},
          product: {$multiply: ["$$value.product", "$$this"]}
        }
      }
    }
  }
}])
{_id: 1, array_totals: {sum: 6, product: 6}}
{_id: 2, array_totals: {sum: 0, product: 1}}

Notes

  • If input is an empty array, the value given to initialResult is returned.

Errors

  • If a value from the input expression is anything but a single array value.
  • 'input', 'initialResult', and 'in' are required options.

Old Description
We currently have the $map operator to apply an expression to each element in an array and return an array with the applied results.

Along similar lines, it would also be useful to have a $reduce operator to rollup an array into a scalar element using a specified combining function. For example, to return the sum of all array elements.

To illustrate:

Given an input of:

{ _id:1, array_field: [1,2,3] }
{ _id:2, array_field: [] }

We should return:

{ _id:1, array_total: 6 }
{ _id:2, array_total: 0 }

This is currently possible using a relatively expensive $unwind and $group (plus some additionally voodoo to handle unwinding of the empty arrays), but it would be more elegant, and likely more performant, to do something like the following:

db.coll.aggregate(
   [
      { $project:
         { array_total:
            {
              $reduce:
                 {
                   $add: "$array_field",
                 }
            }
         }
      }
   ]
)

[Note: the above proposed syntax is just for illustrative purposes]



 Comments   
Comment by Benjamin Murphy [ 21/Apr/16 ]

This ticket introduces the $reduce operator, which should be documented as an aggregation expression. In addition, any drivers that provide aggregation helpers should be updated to include support for $reduce.

Comment by Githook User [ 21/Apr/16 ]

Author:

{u'username': u'benjaminmurphy', u'name': u'Benjamin Murphy', u'email': u'benjamin_murphy@me.com'}

Message: SERVER-17258 Aggregation now supports the reduce expression.
Branch: master
https://github.com/mongodb/mongo/commit/0dfb47dd678b872a425f26c073e72652bbcde90e

Comment by Asya Kamsky [ 29/Apr/15 ]

antoine.hom@amadeus.com it looks likely that this feature (at least some of the variants of it, sum, min, max, etc) will be scheduled for the next major release, so I don't think it's worthwhile for you to spend any effort on it.

Comment by Antoine Hom [ 14/Apr/15 ]

Hey,

I was wondering if that would help get this going if I made a pull request related to this ticket.
I could implement the $sum\$max\$min\$product, but would like to have asya 's inputs to know if it's worthwhile that I invest a bit of time into it.

Comment by Antoine Hom [ 23/Mar/15 ]

After looking at our tests it was indeed a slightly different scenario than the one here.
Anyway, just to make this ticket clear, I think that at least $sum/$max/$min in the $project stage would be very appreciated features both in terms of query complexity and performances.
And I think it is reasonable to expect the same behavior as the equivalent operators in the $group stage with regards to type coercion.

Comment by Asya Kamsky [ 21/Mar/15 ]

You should be able to run [

{ .. pipeline .. }

],

{explain:true}

to see what fields would be passed through - with project first, the pipeline will go with what you explicitly specify, without it it will pass through minimum necessary fields. It would be unexpected for the two to be different duration executions, but would be good to know if they are.

Comment by Antoine Hom [ 21/Mar/15 ]

Hey Asya,

A simple $sum is also completely fine, note that I also have examples where $max/$min would be very helpful and avoid unnecessary $unwind/$group, I would gladly post them in their short versions if you want to see them.

For the other $group operators, we can either work around them ($avg => $sum/$size) or they would be better answered by other feature requests [$first/$last => $slice SERVER-6074] [$push => $concat SERVER-14872]

IIRC, we are always adding a project as soon as possible in our pipelines because we detected some difference in performances. I will take a look at it on monday when I have access to the tests we did in the past but our setup is quite unusual so that may be related to that.

Cheers,
Antoine

Comment by Asya Kamsky [ 20/Mar/15 ]

Please note that your aggregation pipeline is already longer than it needs to be as you have $project stages that don't actually compute any new fields. I recommend that you remove those as they are unnecessary (aggregation will only pass through the fields it knows will be needed) and may in some cases slow things down.

With proposed $sum your pipeline would become:

[
    { $unwind: "$CPN" },
    {
        $project: {
            Ev: 1,
            CPN: {
                CpnNb: 1,
                "TXO>PAmnt": { "$sum": "$CPN.TXO.PAmnt" },
                "TXA>PAmnt": { "$sum": "$CPN.TXA.PAmnt" }
            }
        }
    },
    {
        $group: {
            _id: {
                Ev: "$Ev",
                "CPN>CpnNb": "$CPN.CpnNb"
            },
            "CPN>TXO>PAmnt": { "$sum": "$CPN.TXO>PAmnt" },
            "CPN>TXA>PAmnt": { "$sum": "$CPN.TXA>PAmnt" }
        }
    }
]

or if you prefer terser syntax even:

[
    { $unwind: "$CPN" },
    {
        $group: {
            _id: {
                Ev: "$Ev",
                "CPN>CpnNb": "$CPN.CpnNb"
            },
            "CPN>TXO>PAmnt": { "$sum":  { "$sum" : "$CPN.TXO.PAmnt" } },
            "CPN>TXA>PAmnt": { "$sum":  { "$sum" : "$CPN.TXA.PAmnt" } }
        }
    }
]

Comment by Asya Kamsky [ 20/Mar/15 ]

Would it be sufficient to have a projection operator $sum to $add an array of values? We would probably have an analogous $product for $multiply.

Comment by Antoine Hom [ 20/Mar/15 ]

Here is one concrete example with how we can answer it right now and how it would be answered with such a $reduce operator:

Sample query:

SELECT 
    Ev, CPN.CpnNb, Sum(CPN.TXA.PAmnt), Sum(CPN.FAR.PAmnt)
GROUP BY 
    Ev, CPN.CpnNb;

Sample document:

{
    Ev: "ISS",
    CPN: [
        {
            CpnNb: 1,
            TXA: [
                {
                    Typ: "YQ",
                    PAmnt: 10.0
                }
            ],
            FAR: [
                {
                    Typ: "YR",
                    PAmnt: 21.3
                }
            ]
        }
    ]
}

Aggregation without reduce:

Issues are:

  • Unnecessary unwind/group which can be very expensive performance wise
  • Unnecessary query complexity

[
    {
        $project: {
            Ev: 1,
            CPN: {
                CpnNb: 1,
                TXA: {
                    Typ: 1,
                    PAmnt: 1
                },
                FAR: {
                    Typ: 1,
                    PAmnt: 1
                }
            }
        }
    },
    { $unwind: "$CPN" },
    {
        $project: {
            Ev: 1,
            CPN: {
                CpnNb: 1,
                TXA: { $ifNull: ["$CPN.TXA", [{}]] },
                TXO: { $ifNull: ["$CPN.TXO", [{}]] }
            }
        }
    },
    { $unwind: "$CPN.TXA" },
    {
        $group: {
            _id: {
                Ev: "$Ev",
                "CPN>CpnNb": "$CPN.CpnNb"
            },
            "CPN>TXO": { "$first": "$CPN.TXO" },
            "CPN>TXA>PAmnt": { "$sum": "$CPN.TXA.PAmnt" }
        }
    },
    { $unwind: "$CPN>TXO" },
    {
        $group: {
            _id: "$_id",
            "CPN>TXA>PAmnt": { "$first": "$CPN>TXA>PAmnt" },
            "CPN>TXO>PAmnt": { "$sum": "$CPN>TXO.PAmnt" }
        }
    }
]

Aggregation with reduce:

[
    {
        $project: {
            Ev: 1,
            CPN: {
                CpnNb: 1,
                TXA: {
                    Typ: 1,
                    PAmnt: 1
                },
                FAR: {
                    Typ: 1,
                    PAmnt: 1
                }
            }
        }
    },
    { $unwind: "$CPN" },
    {
        $project: {
            Ev: 1,
            CPN: {
                CpnNb: 1,
                "TXO>PAmnt": { $reduce: { $sum: "$CPN.TXO.PAmnt" } },
                "TXA>PAmnt": { $reduce: { $sum: "$CPN.TXA.PAmnt" } }
            }
        }
    },
    {
        $group: {
            _id: {
                Ev: "$Ev",
                "CPN>CpnNb": "$CPN.CpnNb"
            },
            "CPN>TXO>PAmnt": { "$sum": "$CPN.TXO>PAmnt" },
            "CPN>TXA>PAmnt": { "$sum": "$CPN.TXA>PAmnt" }
        }
    }
]

Generated at Thu Feb 08 03:43:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.