-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
When $group has an _id that is a multi-field document, it does distinguish between null and missing; however in other cases (when the _id is not a document or a single-field document) it conflates the two values:
> db.test.find() { "_id" : 0, "a" : 1, "b" : 0 } { "_id" : 1, "a" : null, "b" : 1 } { "_id" : 2, "b" : 2 } { "_id" : 3, "a" : null, "b" : 2 } > db.test.aggregate([{$group: {_id: {a: "$a"}}}]) { "_id" : { "a" : 1 } } { "_id" : { "a" : null } } > db.test.aggregate([{$group: {_id: {a: "$a", b: "$b"}}}]) { "_id" : { "b" : 2 } } { "_id" : { "a" : 1, "b" : 0 } } { "_id" : { "a" : null, "b" : 1 } } { "_id" : { "a" : null, "b" : 2 } }
This is because of the following code: https://github.com/mongodb/mongo/blob/d95d9dd1e6a34f7af53feee1e55fbc74ae6e32b3/src/mongo/db/pipeline/document_source_group.cpp#L725.
We should change the multi-field _id document behavior to mirror the others (conflate null and missing). This would allow for the DISTINCT_SCAN optimization on multiple group _id fields as described in SERVER-53626.
- duplicates
-
SERVER-21992 Inconsistent results when grouping by possibly missing values
- Backlog
- is depended on by
-
SERVER-53626 Minimize index scanning when retrieving distinct values grouped by more than one field
- Backlog