|
Author:
{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}
Message: SERVER-36723 Push $limit beneath DocumentSourceCursor into the PlanStage layer.
In addition towards working towards the general goal of
doing as much query execution as possible with a PlanStage
tree, this should have a positive performance impact for
certain agg pipelines. Previously, a pipeline with a
$project (or a $project-like stage such as $addFields)
followed by a $limit might have applied this limit only
after a full batch of data was loaded by
DocumentSourceCursor. After this change, the limit will take
effect prior to DocumentSourceCursor batching, and thus may
reduce the amount of data processed by the query.
Branch: master
https://github.com/mongodb/mongo/commit/d1a128b434d89f1cba3f1a4a60a117a55291b098
|
|
The plan depends on whether or not there is an index to support the pushdown:
// without index:
|
db.subset.explain().aggregate({$match:{test:1}},{$sort:{time:-1}},{$limit:3})
|
{
|
"stages" : [
|
{
|
"$cursor" : {
|
"query" : {
|
"test" : 1
|
},
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "agg.subset",
|
"indexFilterSet" : false,
|
"parsedQuery" : {
|
"test" : {
|
"$eq" : 1
|
}
|
},
|
"winningPlan" : {
|
"stage" : "COLLSCAN",
|
"filter" : {
|
"test" : {
|
"$eq" : 1
|
}
|
},
|
"direction" : "forward"
|
},
|
"rejectedPlans" : [ ]
|
}
|
}
|
},
|
{
|
"$sort" : {
|
"sortKey" : {
|
"time" : -1
|
},
|
"limit" : NumberLong(3)
|
}
|
}
|
],
|
"ok" : 1
|
}
|
// with index
|
db.subset.explain().aggregate({$match:{test:1}},{$sort:{time:-1}},{$limit:3})
|
{
|
"stages" : [
|
{
|
"$cursor" : {
|
"query" : {
|
"test" : 1
|
},
|
"sort" : {
|
"time" : -1
|
},
|
"limit" : NumberLong(3),
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "agg.subset",
|
"indexFilterSet" : false,
|
"parsedQuery" : {
|
"test" : {
|
"$eq" : 1
|
}
|
},
|
"winningPlan" : {
|
"stage" : "FETCH",
|
"inputStage" : {
|
"stage" : "IXSCAN",
|
"keyPattern" : {
|
"test" : 1,
|
"time" : 1
|
},
|
"indexName" : "test_1_time_1",
|
"isMultiKey" : true,
|
"multiKeyPaths" : {
|
"test" : [
|
"test"
|
],
|
"time" : [ ]
|
},
|
"isUnique" : false,
|
"isSparse" : false,
|
"isPartial" : false,
|
"indexVersion" : 2,
|
"direction" : "backward",
|
"indexBounds" : {
|
"test" : [
|
"[1.0, 1.0]"
|
],
|
"time" : [
|
"[MaxKey, MinKey]"
|
]
|
}
|
}
|
},
|
"rejectedPlans" : [ ]
|
}
|
}
|
}
|
],
|
"ok" : 1
|
}
|
|
|
|
The specific pipeline I was looking at had a $match, $sort, then $limit. The pipeline optimization would swallow the limit into the sort, push down the match and the sort, but then push the limit back to the beginning of the pipeline.
Then in a second pass of optimizing, the limit gets pushed into the $cursor stage, which artificially imposes the limit as it is batching documents here. Is that what you were thinking?
|