[SERVER-26337] performance problem with $limit in aggregate query Created: 26/Sep/16  Updated: 27/Sep/16  Resolved: 27/Sep/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dan Salmo Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
Operating System: ALL
Steps To Reproduce:

For example, create a query where the first stage is a $match that only finds 6 docs. Second stage is a $sort on an indexed field. Third stage is a $limit=10.

The total document collection size is about 6GB.

Participants:

 Description   

We seem to have a performance problem with $limit in aggregate queries. The query is very very slow (half a minute) - when the number of docs passed into the limit step are less than the limit size.

The query is instantaneous when there are lots of docs returned from stage 1. However, when there are only a few, the aggregate query is very very slow.

Expected behavior: aggregate performance of $limit should be equivalent to a standard query's use of limit.



 Comments   
Comment by Kelsey Schubert [ 27/Sep/16 ]

Hi dan@prattle.co,

Thanks for opening this ticket. I believe the root cause of the issue that you are observing is described in SERVER-7568.

What is happening here is that currently the aggregation framework prefers an index that can be used in the sort stage. Please take a look at the example explain of an aggregation I put together: there are two indexes, {price: 1} and {status: 1}, and we observe that the index on status is selected for the fetch stage to cover the sort.

> db.foo.aggregate([{$match:{'price':{$gte:1000}}},{$sort : {status:1}}],{explain:true})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"price" : {
						"$gt" : 1000
					}
				},
				"sort" : {
					"status" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.foo",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"price" : {
							"$gt" : 1000
						}
					},
					"winningPlan" : {
						"stage" : "FETCH",
						"filter" : {
							"price" : {
								"$gt" : 1000
							}
						},
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"status" : 1
							},
							"indexName" : "status_1",
							"isMultiKey" : false,
							"direction" : "forward",
							"indexBounds" : {
								"status" : [
									"[MinKey, MaxKey]"
								]
							}
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		}
	],
	"ok" : 1
}

Consequently, a selective query predicate will take longer to execute since more keys need to be examined. The relationship to the set limit is incidental to the root cause.

You can observe the same behavior in the query planner by forcing it to use the sort index. Please see the following example explain outputs, first with a selective predicate and then with its compliment. As you can see, the selective predicate needs to examine 9053 documents before it can return 10, whereas its compliment only needs to examine 10 documents before it can return all 10.

> db.foo.find({'price':{$lt:1000}}).sort({status:1}).hint({status:1}).limit(10).explain('executionStats')
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.foo",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"price" : {
				"$lt" : 1000
			}
		},
		"winningPlan" : {
			"stage" : "LIMIT",
			"limitAmount" : 0,
			"inputStage" : {
				"stage" : "FETCH",
				"filter" : {
					"price" : {
						"$lt" : 1000
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"status" : 1
					},
					"indexName" : "status_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"status" : [
							"[MinKey, MaxKey]"
						]
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 10,
		"executionTimeMillis" : 59,
		"totalKeysExamined" : 9053,
		"totalDocsExamined" : 9053,
		"executionStages" : {
			"stage" : "LIMIT",
			"nReturned" : 10,
			"executionTimeMillisEstimate" : 50,
			"works" : 9054,
			"advanced" : 10,
			"needTime" : 9043,
			"needFetch" : 0,
			"saveState" : 70,
			"restoreState" : 70,
			"isEOF" : 1,
			"invalidates" : 0,
			"limitAmount" : 0,
			"inputStage" : {
				"stage" : "FETCH",
				"filter" : {
					"price" : {
						"$lt" : 1000
					}
				},
				"nReturned" : 10,
				"executionTimeMillisEstimate" : 50,
				"works" : 9053,
				"advanced" : 10,
				"needTime" : 9043,
				"needFetch" : 0,
				"saveState" : 70,
				"restoreState" : 70,
				"isEOF" : 0,
				"invalidates" : 0,
				"docsExamined" : 9053,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 9053,
					"executionTimeMillisEstimate" : 20,
					"works" : 9053,
					"advanced" : 9053,
					"needTime" : 0,
					"needFetch" : 0,
					"saveState" : 70,
					"restoreState" : 70,
					"isEOF" : 0,
					"invalidates" : 0,
					"keyPattern" : {
						"status" : 1
					},
					"indexName" : "status_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"status" : [
							"[MinKey, MaxKey]"
						]
					},
					"keysExamined" : 9053,
					"dupsTested" : 0,
					"dupsDropped" : 0,
					"seenInvalidated" : 0,
					"matchTested" : 0
				}
			}
		}
	},
	"serverInfo" : {
		"host" : "fox",
		"port" : 27017,
		"version" : "3.0.12",
		"gitVersion" : "33934938e0e95d534cebbaff656cde916b9c3573"
	},
	"ok" : 1
}
> db.foo.find({'price':{$gt:1000}}).sort({status:1}).hint({status:1}).limit(10).explain('executionStats')
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.foo",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"price" : {
				"$gt" : 1000
			}
		},
		"winningPlan" : {
			"stage" : "LIMIT",
			"limitAmount" : 0,
			"inputStage" : {
				"stage" : "FETCH",
				"filter" : {
					"price" : {
						"$gt" : 1000
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"status" : 1
					},
					"indexName" : "status_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"status" : [
							"[MinKey, MaxKey]"
						]
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 10,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 10,
		"totalDocsExamined" : 10,
		"executionStages" : {
			"stage" : "LIMIT",
			"nReturned" : 10,
			"executionTimeMillisEstimate" : 0,
			"works" : 11,
			"advanced" : 10,
			"needTime" : 0,
			"needFetch" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"limitAmount" : 0,
			"inputStage" : {
				"stage" : "FETCH",
				"filter" : {
					"price" : {
						"$gt" : 1000
					}
				},
				"nReturned" : 10,
				"executionTimeMillisEstimate" : 0,
				"works" : 10,
				"advanced" : 10,
				"needTime" : 0,
				"needFetch" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 0,
				"invalidates" : 0,
				"docsExamined" : 10,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 10,
					"executionTimeMillisEstimate" : 0,
					"works" : 10,
					"advanced" : 10,
					"needTime" : 0,
					"needFetch" : 0,
					"saveState" : 0,
					"restoreState" : 0,
					"isEOF" : 0,
					"invalidates" : 0,
					"keyPattern" : {
						"status" : 1
					},
					"indexName" : "status_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"status" : [
							"[MinKey, MaxKey]"
						]
					},
					"keysExamined" : 10,
					"dupsTested" : 0,
					"dupsDropped" : 0,
					"seenInvalidated" : 0,
					"matchTested" : 0
				}
			}
		}
	},
	"serverInfo" : {
		"host" : "fox",
		"port" : 27017,
		"version" : "3.0.12",
		"gitVersion" : "33934938e0e95d534cebbaff656cde916b9c3573"
	},
	"ok" : 1
}

Please feel to vote for SERVER-7568 and watch it for updates. Another community member, andrey.hohutkin@gmail.com, has suggested some workarounds, which work for now.

Kind regards,
Thomas

Generated at Thu Feb 08 04:11:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.