Details
-
Improvement
-
Resolution: Won't Do
-
Major - P3
-
None
-
None
-
Query Optimization
Description
The optimized code path that uses a random cursor to provide a $sample stage currently unconditionally appends a FETCH stage on top of the index scan: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/db/pipeline/pipeline_d.cpp#L253-L254
This is unnecessary in cases like this where we only need the _id to answer the aggregation:
> db.foo.drop();
|
true
|
> for (var i = 0; i < 10000; i++) { db.foo.insert({_id: i}); } |
WriteResult({ "nInserted" : 1 }) |
> db.foo.explain().aggregate([{$sample: {size: 10}}, {$bucketAuto: {groupBy: "$_id", buckets: 2}}]) |
{
|
"stages" : [ |
{
|
"$cursor" : { |
"query" : { |
|
},
|
"fields" : { |
"_id" : 1 |
},
|
"queryPlanner" : { |
"plannerVersion" : 1, |
"namespace" : "test.foo", |
"indexFilterSet" : false, |
"winningPlan" : { |
"stage" : "FETCH", // This FETCH stage is not necessary. |
"inputStage" : { |
"stage" : "INDEX_ITERATOR" |
}
|
},
|
"rejectedPlans" : [ ] |
}
|
}
|
},
|
{
|
"$sampleFromRandomCursor" : { |
"size" : NumberLong(10) |
}
|
},
|
{
|
"$bucketAuto" : { |
"groupBy" : "$_id", |
"buckets" : 2, |
"output" : { |
"count" : { |
"$sum" : { |
"$const" : 1 |
}
|
}
|
}
|
}
|
}
|
],
|
"ok" : 1, |
"operationTime" : Timestamp(0, 0) |
}
|
This optimization is valid if the Pipeline either has no dependencies, or if the only dependency is the _id. To check this, we'll need to move this dependency calculation up to before the handling of a $sample stage. If the only dependency is the _id, we'll need to add a PROJECTION stage to transform the index key into a full-blown document.