[SERVER-36723] $limit should push down into the PlanStage layer, rather than coalescing with DocumentSourceCursor Created: 17/Aug/18  Updated: 29/Oct/23  Resolved: 10/Oct/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Nicholas Zolnierz Assignee: David Storch
Resolution: Fixed Votes: 3
Labels: storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-43297 Inefficient query on view due to $lim... Closed
is duplicated by SERVER-44309 Not full index use in aggregation pip... Closed
Problem/Incident
Related
related to SERVER-24860 Optimize away entire pipeline if it c... Closed
related to SERVER-44309 Not full index use in aggregation pip... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2019-06-03, Query 2019-10-07, Query 2019-10-21
Participants:
Linked BF Score: 136

 Description   

DocumentSourceCursor will absorb a $limit stage and manually apply the limit as it is batching documents. However, we should consider instead pushing this limit to the query system similar to what we do for $match, $sort, etc..



 Comments   
Comment by Githook User [ 10/Oct/19 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-36723 Push $limit beneath DocumentSourceCursor into the PlanStage layer.

In addition towards working towards the general goal of
doing as much query execution as possible with a PlanStage
tree, this should have a positive performance impact for
certain agg pipelines. Previously, a pipeline with a
$project (or a $project-like stage such as $addFields)
followed by a $limit might have applied this limit only
after a full batch of data was loaded by
DocumentSourceCursor. After this change, the limit will take
effect prior to DocumentSourceCursor batching, and thus may
reduce the amount of data processed by the query.
Branch: master
https://github.com/mongodb/mongo/commit/d1a128b434d89f1cba3f1a4a60a117a55291b098

Comment by Asya Kamsky [ 01/Oct/18 ]

The plan depends on whether or not there is an index to support the pushdown:

// without index:
db.subset.explain().aggregate({$match:{test:1}},{$sort:{time:-1}},{$limit:3})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"test" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "agg.subset",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"test" : {
							"$eq" : 1
						}
					},
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"filter" : {
							"test" : {
								"$eq" : 1
							}
						},
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sort" : {
				"sortKey" : {
					"time" : -1
				},
				"limit" : NumberLong(3)
			}
		}
	],
	"ok" : 1
}
// with index
db.subset.explain().aggregate({$match:{test:1}},{$sort:{time:-1}},{$limit:3})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"test" : 1
				},
				"sort" : {
					"time" : -1
				},
				"limit" : NumberLong(3),
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "agg.subset",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"test" : {
							"$eq" : 1
						}
					},
					"winningPlan" : {
						"stage" : "FETCH",
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"test" : 1,
								"time" : 1
							},
							"indexName" : "test_1_time_1",
							"isMultiKey" : true,
							"multiKeyPaths" : {
								"test" : [
									"test"
								],
								"time" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "backward",
							"indexBounds" : {
								"test" : [
									"[1.0, 1.0]"
								],
								"time" : [
									"[MaxKey, MinKey]"
								]
							}
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		}
	],
	"ok" : 1
}
 

Comment by Nicholas Zolnierz [ 17/Aug/18 ]

The specific pipeline I was looking at had a $match, $sort, then $limit. The pipeline optimization would swallow the limit into the sort, push down the match and the sort, but then push the limit back to the beginning of the pipeline.

Then in a second pass of optimizing, the limit gets pushed into the $cursor stage, which artificially imposes the limit as it is batching documents here. Is that what you were thinking?

Comment by Asya Kamsky [ 17/Aug/18 ]

I'm pretty sure we already do if it's possible - do you have an example pipeline where we don't do it but we could?

Generated at Thu Feb 08 04:43:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.