[SERVER-60977] Make $group _id behavior with null and missing more consistent Created: 25/Oct/21  Updated: 06/Dec/22  Resolved: 26/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Katherine Wu (Inactive) Assignee: Backlog - Query Optimization
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53626 Minimize index scanning when retrievi... Backlog
Duplicate
duplicates SERVER-21992 Inconsistent results when grouping by... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

When $group has an _id that is a multi-field document, it does distinguish between null and missing; however in other cases (when the _id is not a document or a single-field document) it conflates the two values:

> db.test.find()
{ "_id" : 0, "a" : 1, "b" : 0 }
{ "_id" : 1, "a" : null, "b" : 1 }
{ "_id" : 2, "b" : 2 }
{ "_id" : 3, "a" : null, "b" : 2 }
> db.test.aggregate([{$group: {_id: {a: "$a"}}}])
{ "_id" : { "a" : 1 } }
{ "_id" : { "a" : null } }
> db.test.aggregate([{$group: {_id: {a: "$a", b: "$b"}}}])
{ "_id" : { "b" : 2 } }
{ "_id" : { "a" : 1, "b" : 0 } }
{ "_id" : { "a" : null, "b" : 1 } }
{ "_id" : { "a" : null, "b" : 2 } }

This is because of the following code: https://github.com/mongodb/mongo/blob/d95d9dd1e6a34f7af53feee1e55fbc74ae6e32b3/src/mongo/db/pipeline/document_source_group.cpp#L725.

We should change the multi-field _id document behavior to mirror the others (conflate null and missing). This would allow for the DISTINCT_SCAN optimization on multiple group _id fields as described in SERVER-53626.


Generated at Thu Feb 08 05:51:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.