- 
    Type:
Bug
 - 
    Resolution: Fixed
 - 
    Priority:
Major - P3
 - 
    Affects Version/s: None
 - 
    Component/s: None
 
- 
        Query Optimization
 - 
        Fully Compatible
 - 
        ALL
 - 
        v8.2, v8.0, v7.0
 - 
        
 - 
        None
 
- 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 
ISSUE DESCRIPTION AND IMPACT
Certain $group stages are eligible to use a DISTINCT_SCAN on an index prefixed by the grouping field. In order to be eligible for the optimization, the $group can be prefixed by a $sort that matches the sort order of the index, and can either have no accumulator expressions or contain only $first/$last/$top/$bottom expressions.
The issue occurs when the index used for the DISTINCT_SCAN is sparse and one of the documents does not contain the grouping field. It is incorrect to use a sparse index for this optimization since $group and sparse indexes treat missing fields differently: $group treats them as null while sparse indexes do not track an index key for missing fields. A DISTINCT_SCAN over a sparse index will therefore not forward values to the $group where the indexed field is missing, so query results will be lost (compared to if only a collection scan was used, or a non-sparse index).
This could affect the write path if there is a $out/$merge that follows a $group that incorrectly uses a DISTINCT_SCAN on a sparse index (the collection resulting from the $out could have missing documents).
Example:
The collection has a document where the field ‘a’ is missing. Grouping on ‘a’ with the sparse index has incorrect query results.
db.coll.insert({b: 5})
db.coll.aggregate([{$group: {_id: "$a"}}])
// [ { "_id" : null } ]
db.coll.createIndex({a: 1}, {sparse: true})
db.coll.aggregate([{$group: {_id: "$a"}}]) 
// [ ] 
REMEDIATION AND WORKAROUNDS
Users are recommended to upgrade to the fixed versions. Alternatively, users can hint a collection scan (or an alternative, eligible non-sparse index) for affected queries. For example:
db.coll.aggregate([{$group: {"_id": "$a"}}], {hint: {$natural: 1}})
This can also be done via query settings (i.e. for the affected query shape, set the hint to be a collection scan, or an alternative non-sparse index that is eligible).
—-----------------------------------------------------
Original description
The repro shows the $group using a distinct scan on a sparse index. null/missing values affect the query results in this case, so we should not use the sparse index (a regular non-sparse index would be fine though)
Our documentation says we will not use a sparse index if it affects query results, unless the index is explicitly hinted. In this case it's not hinted and changes results.
- related to
 - 
                    
SERVER-111486 $group can still use a distinct scan on a sparse index if an alternative index is available
-         
 - Closed
 
 -         
 - 
                    
SERVER-110771 Create property-based test for correctness, using documents with missing fields
-         
 - Open
 
 -