|
The optimized code path that uses a random cursor to provide a $sample stage currently unconditionally appends a FETCH stage on top of the index scan: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/db/pipeline/pipeline_d.cpp#L253-L254
This is unnecessary in cases like this where we only need the _id to answer the aggregation:
> db.foo.drop();
|
true
|
> for (var i = 0; i < 10000; i++) { db.foo.insert({_id: i}); }
|
WriteResult({ "nInserted" : 1 })
|
> db.foo.explain().aggregate([{$sample: {size: 10}}, {$bucketAuto: {groupBy: "$_id", buckets: 2}}])
|
{
|
"stages" : [
|
{
|
"$cursor" : {
|
"query" : {
|
|
},
|
"fields" : {
|
"_id" : 1
|
},
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "test.foo",
|
"indexFilterSet" : false,
|
"winningPlan" : {
|
"stage" : "FETCH", // This FETCH stage is not necessary.
|
"inputStage" : {
|
"stage" : "INDEX_ITERATOR"
|
}
|
},
|
"rejectedPlans" : [ ]
|
}
|
}
|
},
|
{
|
"$sampleFromRandomCursor" : {
|
"size" : NumberLong(10)
|
}
|
},
|
{
|
"$bucketAuto" : {
|
"groupBy" : "$_id",
|
"buckets" : 2,
|
"output" : {
|
"count" : {
|
"$sum" : {
|
"$const" : 1
|
}
|
}
|
}
|
}
|
}
|
],
|
"ok" : 1,
|
"operationTime" : Timestamp(0, 0)
|
}
|
This optimization is valid if the Pipeline either has no dependencies, or if the only dependency is the _id. To check this, we'll need to move this dependency calculation up to before the handling of a $sample stage. If the only dependency is the _id, we'll need to add a PROJECTION stage to transform the index key into a full-blown document.
|