|
When $group has an _id that is a multi-field document, it does distinguish between null and missing; however in other cases (when the _id is not a document or a single-field document) it conflates the two values:
> db.test.find()
|
{ "_id" : 0, "a" : 1, "b" : 0 }
|
{ "_id" : 1, "a" : null, "b" : 1 }
|
{ "_id" : 2, "b" : 2 }
|
{ "_id" : 3, "a" : null, "b" : 2 }
|
> db.test.aggregate([{$group: {_id: {a: "$a"}}}])
|
{ "_id" : { "a" : 1 } }
|
{ "_id" : { "a" : null } }
|
> db.test.aggregate([{$group: {_id: {a: "$a", b: "$b"}}}])
|
{ "_id" : { "b" : 2 } }
|
{ "_id" : { "a" : 1, "b" : 0 } }
|
{ "_id" : { "a" : null, "b" : 1 } }
|
{ "_id" : { "a" : null, "b" : 2 } }
|
This is because of the following code: https://github.com/mongodb/mongo/blob/d95d9dd1e6a34f7af53feee1e55fbc74ae6e32b3/src/mongo/db/pipeline/document_source_group.cpp#L725.
We should change the multi-field _id document behavior to mirror the others (conflate null and missing). This would allow for the DISTINCT_SCAN optimization on multiple group _id fields as described in SERVER-53626.
|