[SERVER-7618] New aggregation expression: generator for serial numbers Created: 11/Nov/12  Updated: 06/Dec/22  Resolved: 04/Feb/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Richard Kreuter (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-9377 Allow collecting "top" N values for e... Closed
Assigned Teams:
Query
Participants:

 Description   

It might be nice if the agg. framework had expressions in $project for generating things like sequential numbers.

Here's one use for such a thing: given some inputs of the form

{ A : <Avalue>, B : <Bvalue> }

produce groupings on A values, remove exactly 1 instance of the minimum B value per group. (AFAICT, this problem can't be solved yet in the aggregation framework.)

If $project could join serial numbers into those inputs, then it would be possible by constructing a unique minimum Bvalue subdocument like this:

[
/* Add a serial number to every input using a new $generate operator. */
{ $project : { A : 1, B : 1, s : { $generate : { $serial : 1 } } } },
/* Group by A, computing a minimum (B, s) value for the next stages */
{ $group : { _id:"$A", B:{$push:"$B"}, s:"$s", 
             min: { $min : { B:"$B" , s:"$s" } },
/* Unwind on B so as to project&filter later. */
{ $unwind : "$B" },
/* Figure out if we're looking at the minimum B. */
{ $project : { _id:1, B:1, isMin:{ $eq : [ { B:"$B", s:"$s" }, "$min" ] } } },
/* Filter out the isMin=true cases */
{ $match : { isMin: false } },
/* Re-group by A (which is called _id at this point) */
{ $group: { _id: "$_id", B: { $push: "$B" } } }
]

Of course there are other (and probably better) aggregation extensions that would solve this problem, but the requested feature both helps with this one and might be useful elsewhere.

(In the made-up $generate expression above, I stuck the keyword $serial in there in case it turns out to be useful to have things other than serial numbers in future, e.g., random numbers, ObjectIds, timestamps, etc.)

Doc changes: if we do it, we oughtta doc it.



 Comments   
Comment by Charlie Swanson [ 04/Feb/16 ]

I believe this could be addressed in a simpler way via SERVER-9377.

e.g. you could do something like the following (making up a syntax):

{$group: {
    _id: "$A",
    B: {
        $pushSorted: {
            value: "$B",
            sort: {B: 1},
            slice: 1
        }
    }
}

Closing as a duplicate. Feel free to let me know if you disagree.

Comment by Charlie Swanson [ 04/Feb/16 ]

asya I believe this is separate from SERVER-20169. You couldn't use a $range to express this. You would have to know the number of matching documents up front, and even then I'm not sure this would get you what you want.

Generated at Thu Feb 08 03:15:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.