$group can use a distinct scan on a sparse index, which hides null/missing results

XMLWordPrintableJSON

    • Query Optimization
    • Fully Compatible
    • ALL
    • v8.2, v8.0, v7.0
    • Hide
      const query = [{
      	"$group" : {
      		"_id" : "$a.b"
      	}
      }]
      
      const docs = [
          {
              "_id" : 244,
              "a" : NumberInt(0),
          }
      ]
      
      
      const coll = db.foo;
      coll.drop()
      assert.commandWorked(coll.insert(docs))
      
      jsTestLog(coll.aggregate(query).toArray())
      // [ { "_id" : null } ]
      
      assert.commandWorked(coll.createIndex({"a.b": 1}, {sparse: true}))
      jsTestLog(coll.aggregate(query).toArray())
      // [ ]
      
      jsTestLog(coll.explain().aggregate(query))
      // distinct scan on the sparse index, which is why we don't see "null"
      
      Show
      const query = [{ "$group" : { "_id" : "$a.b" } }] const docs = [ { "_id" : 244, "a" : NumberInt(0), } ] const coll = db.foo; coll.drop() assert .commandWorked(coll.insert(docs)) jsTestLog(coll.aggregate(query).toArray()) // [ { "_id" : null } ] assert .commandWorked(coll.createIndex({ "a.b" : 1}, {sparse: true })) jsTestLog(coll.aggregate(query).toArray()) // [ ] jsTestLog(coll.explain().aggregate(query)) // distinct scan on the sparse index, which is why we don't see " null "
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Issue Status

      ISSUE DESCRIPTION AND IMPACT
      Certain $group stages are eligible to use a DISTINCT_SCAN on an index prefixed by the grouping field. In order to be eligible for the optimization, the $group can be prefixed by a $sort that matches the sort order of the index, and can either have no accumulator expressions or contain only $first/$last/$top/$bottom expressions.

      The issue occurs when the index used for the DISTINCT_SCAN is sparse and one of the documents does not contain the grouping field. It is incorrect to use a sparse index for this optimization since $group and sparse indexes treat missing fields differently: $group treats them as null while sparse indexes do not track an index key for missing fields. A DISTINCT_SCAN over a sparse index will therefore not forward values to the $group where the indexed field is missing, so query results will be lost (compared to if only a collection scan was used, or a non-sparse index).

      This could affect the write path if there is a $out/$merge that follows a $group that incorrectly uses a DISTINCT_SCAN on a sparse index (the collection resulting from the $out could have missing documents).

      Example:
      The collection has a document where the field ‘a’ is missing. Grouping on ‘a’ with the sparse index has incorrect query results.

      db.coll.insert({b: 5})
      
      db.coll.aggregate([{$group: {_id: "$a"}}])
      // [ { "_id" : null } ]
      
      db.coll.createIndex({a: 1}, {sparse: true})
      
      db.coll.aggregate([{$group: {_id: "$a"}}]) 
      // [ ] 
      

      REMEDIATION AND WORKAROUNDS
      Users are recommended to upgrade to the fixed versions. Alternatively, users can hint a collection scan (or an alternative, eligible non-sparse index) for affected queries. For example:

      db.coll.aggregate([{$group: {"_id": "$a"}}], {hint: {$natural: 1}})
      

      This can also be done via query settings (i.e. for the affected query shape, set the hint to be a collection scan, or an alternative non-sparse index that is eligible).

      —-----------------------------------------------------

      Original description

      The repro shows the $group using a distinct scan on a sparse index. null/missing values affect the query results in this case, so we should not use the sparse index (a regular non-sparse index would be fine though)

      Our documentation says we will not use a sparse index if it affects query results, unless the index is explicitly hinted. In this case it's not hinted and changes results.

            Assignee:
            Militsa Sotirova
            Reporter:
            Matt Boros
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: