[SERVER-16388] Aggregation operation not using any fields should use an appropriate index instead of collection scan Created: 02/Dec/14  Updated: 25/Jun/15  Resolved: 08/Apr/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Anil Kumar Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-12015 Reenable use of covering indexes with... Closed
Related
Backwards Compatibility: Fully Compatible
Participants:

 Description   

This is not an issue but some basic optimization for aggregation on the collection when no fields are required from the document to complete the aggregation.

The following operation is performing aggregation without using any field from the document, but still results in the full collection scan:

db.places.aggregate([{$group: {_id: null, count: {$sum: 1} } }], {explain: true})

{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"_id" : 0,
					"$noFieldsNeeded" : 1
				},
				"plan" : {
					"cursor" : "BasicCursor",
					"isMultiKey" : false,
					"scanAndOrder" : false,
					"allPlans" : [
						{
							"cursor" : "BasicCursor",
							"isMultiKey" : false,
							"scanAndOrder" : false
						}
					]
				}
			}
		},
		{
			"$group" : {
				"_id" : {
					"$const" : null
				},
				"count" : {
					"$sum" : {
						"$const" : 1
					}
				}
			}
		}
	],
	"ok" : 1
}

Ideally, in non-sharded environment, this should be traversing the _id index and on sharded environment it should be traversing the shard-key index.



 Comments   
Comment by Asya Kamsky [ 07/Mar/15 ]

Aggregation is not generally going to be faster than count so if count has a performance problem, it seems unlikely that aggregate would perform better.

Comment by Christopher Price [ 07/Mar/15 ]

Our application allows customers to search for documents in their database and we supply pagination information regarding the results. (Displaying records 1-100 out of 2147654)

We have noticed performance issues in some count operations and we were recommended to switch to aggregation.

Comment by Asya Kamsky [ 07/Mar/15 ]

You would need to describe your use case - it's not clear to me why you are using aggregation to do a count in your case.

Comment by Christopher Price [ 06/Mar/15 ]

I will reiterate my earlier statement: "It would be preferable to have a consistent interface for this operation."
Yes we could use an "if/then/else" to control if we should add a sort to an aggregation query in this one use case. But that is the exact same "if/then/else" that we have today to determine if should use aggregate or count.

Comment by Asya Kamsky [ 06/Mar/15 ]

A possible work-around until this is implemented would be to add {$sort:{_id:1}} as the first stage to the aggregation - that will use the index on _id.

Comment by Christopher Price [ 05/Mar/15 ]

As recommended by Mongo we moved all of our count operations to use aggregation instead of simple count(). After doing this we discovered this outlier. So now we have to maintain code that first checks if there is any search_criteria, and if not then we use simple count() otherwise we use aggregation for counting. It would be preferable to have a consistent interface for this operation.

Comment by Asya Kamsky [ 03/Dec/14 ]

Is this not just db.collection.count()? Why would it be done via aggregation?

Generated at Thu Feb 08 03:40:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.