[SERVER-48488] $group does not use index when grouping on _id attributes Created: 29/May/20  Updated: 29/May/20  Resolved: 29/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Olivier Boudet Assignee: Carl Champain (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

$group used on partial _id field does not use index.

I have a collection with an compound index field. For example like this one : 

 

db.test_group_index.insert([
    {
        _id: {
            attr1: "value1",
            attr2: "value2",
            attr3: "value3.1",
        },
        attr4: "value4.1"
    },
    {
        _id: {
            attr1: "value1",
            attr2: "value2",
            attr3: "value3.2",
        },
        attr4: "value4.2"
    },
    {
        _id: {
            attr1: "value1",
            attr2: "value2",
            attr3: "value3.3",
        },
        attr4: "value4.3"
    }
    ]);
 

 

I want to group over two _id field to work only on distinct values for keys attr1 & attr2. 

I do this :

 

    
db.test_group_index.explain().aggregate([
    {
        $group: {
            _id: {
                attr1: "$_id.attr1",
                attr2: "$_id.attr2"
            },
            attr4: {
              $first: "$attr4"  
            }
        }
    }
    ])

The result of this explain query shows that a COLLSCAN is used.

 

 

 

{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"attr4" : 1,
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "ref.test_group_index",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "8B3D4AB8",
					"planCacheKey" : "8B3D4AB8",
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$group" : {
				"_id" : {
					"attr1" : "$_id.attr1",
					"attr2" : "$_id.attr2"
				},
				"attr4" : {
					"$first" : "$attr4"
				}
			}
		}
	],
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1590759417, 4),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : 0
		}
	},
	"operationTime" : Timestamp(1590759414, 1)
}

 



 Comments   
Comment by Carl Champain (Inactive) [ 29/May/20 ]

Hi o.boudet@gmail.com,

The documentation mentions:

The $group stage can sometimes use an index to find the first document in each group if all of the following criteria are met:

  • The $group stage is preceded by a $sort stage that sorts the field to group by,
  • There is an index on the grouped field which matches the sort order and
  • The only accumulator used in the $group stage is $first.

So you need to add a $sort stage to the query:

db.test_group_index.explain().aggregate([{$sort: {_id:1, attr4:1}},{$group: { _id: { attr1: "$_id.attr1", attr2: "$_id.attr2"}, attr4: { $first: "$attr4" }}}])

That said, the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the MongoDB Community Forums or on Stack Overflow with the mongodb tag.

Kind regards,
Carl

Generated at Thu Feb 08 05:17:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.