[SERVER-29421] Aggregation executionStats or allPlansExecution explain modes behave incorrectly with $limit Created: 01/Jun/17  Updated: 30/Oct/23  Resolved: 24/Jan/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.5.8
Fix Version/s: 3.7.2

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-53525 aggregate $lookup is very slow, it i... Closed
Related
related to SERVER-35967 $sample with explain(true) hangs Closed
related to SERVER-38063 executionStats or allPlansExecution l... Closed
is related to SERVER-19758 Add "executionStats" and "allPlansExe... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2017-06-19, Query 2017-08-21, Query 2017-12-18, Query 2018-01-01, Query 2018-01-15, Query 2018-01-29
Participants:
Case:

 Description   

When there is a limit that can be absorbed into the $cursor stage of the pipeline, this limit is not respected when reporting explain execution stats. Consider the following example:

> db.c.drop()
true
> db.c.createIndex({a: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> for (var i = 0; i < 10; i++) { db.c.insert({a: 1}); }
WriteResult({ "nInserted" : 1 })
> db.c.explain("executionStats").aggregate([{$match: {a: 1}}, {$limit: 3}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"a" : 1
				},
				"limit" : NumberLong(3),
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.c",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"a" : {
							"$eq" : 1
						}
					},
					"winningPlan" : {
						"stage" : "FETCH",
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"a" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[1.0, 1.0]"
								]
							}
						}
					},
					"rejectedPlans" : [ ]
				},
				"executionStats" : {
					"executionSuccess" : true,
					"nReturned" : 10,
					"executionTimeMillis" : 3,
					"totalKeysExamined" : 10,
					"totalDocsExamined" : 10,
					"executionStages" : {
						"stage" : "FETCH",
						"nReturned" : 10,
						"executionTimeMillisEstimate" : 0,
						"works" : 11,
						"advanced" : 10,
						"needTime" : 0,
						"needYield" : 0,
						"saveState" : 1,
						"restoreState" : 1,
						"isEOF" : 1,
						"invalidates" : 0,
						"docsExamined" : 10,
						"alreadyHasObj" : 0,
						"inputStage" : {
							"stage" : "IXSCAN",
							"nReturned" : 10,
							"executionTimeMillisEstimate" : 0,
							"works" : 11,
							"advanced" : 10,
							"needTime" : 0,
							"needYield" : 0,
							"saveState" : 1,
							"restoreState" : 1,
							"isEOF" : 1,
							"invalidates" : 0,
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"a" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[1.0, 1.0]"
								]
							},
							"keysExamined" : 10,
							"seeks" : 1,
							"dupsTested" : 0,
							"dupsDropped" : 0,
							"seenInvalidated" : 0
						}
					}
				}
			}
		}
	],
	"ok" : 1
}

Here the limit of 3 is indeed present in the $cursor.limit field. However, the $cursor.executionStats field shows that the index scan stage examined all 10 keys and that the fetch stage subsequently retrieved all 10 documents. This reporting is incorrect, since the limit of 3 will prevent us from having to look at all matching keys/documents.

The root cause of this problem is that the limit is applied here as the DocumentSourceCursor is retrieving results from the underlying PlanExecutor. However, this code is not exercised by the explain path. Instead, when running an explain we execute the plan and retrieve stats here. Note that this only affects versions 3.5.5 and newer, e.g. versions after execution stats were added to aggregation explain in SERVER-19758.

To fix this, we should have the DocumentSourceCursor to push the limit down into the PlanExecutor itself. This is also wise from an architecture point of view, since DocumentSourceCursor is intended to be a wrapper around a PlanExecutor rather than acting as a query execution operator in its own right.



 Comments   
Comment by Githook User [ 24/Jan/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-29421 Make agg explain use outer PlanExecutor
Branch: master
https://github.com/mongodb/mongo/commit/fec49f5e76d3b55a191c141974fdcf1aea7cdc96

Comment by David Storch [ 05/Jun/17 ]

natalia.jacobowitz, I had an alternative idea for how to fix this which does not involve pushing the limit down into the PlanExecutor. I discussed briefly with james.wahlin, but let's discuss in person before you start in earnest on implementing the fix.

Generated at Thu Feb 08 04:20:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.