[SERVER-6199] $first and $last accumulators accept null, missing and undefined Created: 25/Jun/12  Updated: 06/Dec/22  Resolved: 11/Aug/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Aaron Staple Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-6471 aggregation $min uses Value ordering ... Closed
Assigned Teams:
Query
Participants:

 Description   

Many accumulators ignore null-ish values, but $first/$last do not.

Observed behavior: If the first or last entry for a field is undefined, the $first / $last accumulator becomes undefined (the field is dropped).
Expected behavior: Potentially the first / last non undefined field could be chosen.

Test:

c = db.c;
c.drop();
 
c.save( {} );
c.save( { a:1 } );
c.save( {} );
 
// The 'a' field of the first document is undefined, so no 'z' result.
printjson( c.aggregate( { $group:{ _id:0, z:{ $first:'$a' } } } ) );
// The 'a' field of the last document is undefined, so no 'z' result.
printjson( c.aggregate( { $group:{ _id:0, z:{ $last:'$a' } } } ) );
// The 'z' result is [1], the undefined values of the first and last documents are excluded.
printjson( c.aggregate( { $group:{ _id:0, z:{ $push:'$a' } } } ) );



 Comments   
Comment by Charlie Swanson [ 11/Aug/20 ]

This is intentional behavior as outlined in some comments above. Closing.

Comment by Asya Kamsky [ 03/Aug/20 ]

Flagging for triage to close "WAD"

Comment by Charlie Swanson [ 27/Nov/17 ]

That sounds reasonable to me - maybe bump it back to 'Needs Triage' to make sure the whole query team is on the same page. It sounds like the strange behavior described in this example could be avoided by adding a $match before the $group to filter out those without a value for 'a'.

Comment by Asya Kamsky [ 23/Nov/17 ]

This seems WAD - maybe should be closed as such?
charlie.swanson what do you think?

Comment by Mathias Stearn [ 19/Oct/12 ]

So there is a potential issue here. Consider the following pipeline:

db.cities.aggregate({$sort: {population: -1}}
                   ,{$group: {_id: '$state'
                             ,biggestCityName: {$first: '$city'}
                             ,biggestCitySize: {$first: '$population'}
                             }})

I think there is the assumption that biggestCityName and biggestCitySize would come from the same city which would no longer be guaranteed if we made this change. Maybe we should support a way to conditionalize whether an input document should be considered for a $group field. Ideally the same logic should be usable by $min, $max and all other operations as well. Also you probably want the ability to make the conditions per-field so you can have {$avg: '$population'} both including and excluding empty cities. I need to think more about how the syntax for this should look because I'm not having any good ideas yet.

Generated at Thu Feb 08 03:11:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.