[SERVER-40465] Inconsistent results from $group with and without index Created: 03/Apr/19  Updated: 29/Oct/23  Resolved: 03/May/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 4.1.4
Fix Version/s: 4.1.11

Type: Bug Priority: Critical - P2
Reporter: Ian Boros Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2019-05-06, Query 2019-05-20
Participants:

 Description   

> db.d.find()
{ "_id" : ObjectId("5ca4fb6f4e1d8532f8828e79"), "a" : 1 }
{ "_id" : ObjectId("5ca4fb744e1d8532f8828e7a"), "a" : [ 1, 2 ] }
> db.d.aggregate( [ { $group : { _id : "$a" } } ] ) // query without index
{ "_id" : 1 }
{ "_id" : [ 1, 2 ] }
> db.d.createIndex({a:1})
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}
 
> db.d.aggregate( [ { $group : { _id : "$a" } } ] ) // query with index
{ "_id" : 1 } // Different from above!
{ "_id" : 2 }

It looks like this is because when an index is present, a DISTINCT_SCAN is used. It seems incorrect to use a DISTINCT_SCAN for a $group when the index is multikey (though we should think about this harder).



 Comments   
Comment by Githook User [ 03/May/19 ]

Author:

{'email': 'puppyofkosh@gmail.com', 'name': 'Ian Boros', 'username': 'puppyofkosh'}

Message: SERVER-40465 $group will not distinct scan multikey index
Branch: master
https://github.com/mongodb/mongo/commit/b1a9c9adea89b475fb05660e2a1cad00971e6899

Comment by Ian Boros [ 08/Apr/19 ]

kelsey.schubert You make a good point about SERVER-28952. Looks like the reason that didn't prevent this issue is that the fix only applies to cases where the distinct scan has index bounds other than [MinKey, MaxKey] (known as a "simple" distinct scan). This only happens in cases when the distinct command is used with a predicate.

Since the $group stage doesn't unwind arrays like the distinct command does, I believe it's incorrect for $group to use a distinct scan when the index is multikey even if it's a "simple" distinct scan.

Comment by Kelsey Schubert [ 03/Apr/19 ]

It looks like this is because when an index is present, a DISTINCT_SCAN is used. It seems incorrect to use a DISTINCT_SCAN for a $group when the index is multikey (though we should think about this harder).

I thought that was the point of SERVER-28952.

Generated at Thu Feb 08 04:55:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.