Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60977

Make $group _id behavior with null and missing more consistent

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Optimization

      When $group has an _id that is a multi-field document, it does distinguish between null and missing; however in other cases (when the _id is not a document or a single-field document) it conflates the two values:

      > db.test.find()
      { "_id" : 0, "a" : 1, "b" : 0 }
      { "_id" : 1, "a" : null, "b" : 1 }
      { "_id" : 2, "b" : 2 }
      { "_id" : 3, "a" : null, "b" : 2 }
      > db.test.aggregate([{$group: {_id: {a: "$a"}}}])
      { "_id" : { "a" : 1 } }
      { "_id" : { "a" : null } }
      > db.test.aggregate([{$group: {_id: {a: "$a", b: "$b"}}}])
      { "_id" : { "b" : 2 } }
      { "_id" : { "a" : 1, "b" : 0 } }
      { "_id" : { "a" : null, "b" : 1 } }
      { "_id" : { "a" : null, "b" : 2 } }
      

      This is because of the following code: https://github.com/mongodb/mongo/blob/d95d9dd1e6a34f7af53feee1e55fbc74ae6e32b3/src/mongo/db/pipeline/document_source_group.cpp#L725.

      We should change the multi-field _id document behavior to mirror the others (conflate null and missing). This would allow for the DISTINCT_SCAN optimization on multiple group _id fields as described in SERVER-53626.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            katherine.wu@mongodb.com Katherine Wu (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: