Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27389

Use a covered index scan if $sample is the first stage and only the _id is needed

    • Query Optimization

      The optimized code path that uses a random cursor to provide a $sample stage currently unconditionally appends a FETCH stage on top of the index scan: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/db/pipeline/pipeline_d.cpp#L253-L254

      This is unnecessary in cases like this where we only need the _id to answer the aggregation:

      > db.foo.drop();
      true
      > for (var i = 0; i < 10000; i++) { db.foo.insert({_id: i}); }
      WriteResult({ "nInserted" : 1 })
      > db.foo.explain().aggregate([{$sample: {size: 10}}, {$bucketAuto: {groupBy: "$_id", buckets: 2}}])
      {
      	"stages" : [
      		{
      			"$cursor" : {
      				"query" : {
      					
      				},
      				"fields" : {
      					"_id" : 1
      				},
      				"queryPlanner" : {
      					"plannerVersion" : 1,
      					"namespace" : "test.foo",
      					"indexFilterSet" : false,
      					"winningPlan" : {
      						"stage" : "FETCH",  // This FETCH stage is not necessary.
      						"inputStage" : {
      							"stage" : "INDEX_ITERATOR"
      						}
      					},
      					"rejectedPlans" : [ ]
      				}
      			}
      		},
      		{
      			"$sampleFromRandomCursor" : {
      				"size" : NumberLong(10)
      			}
      		},
      		{
      			"$bucketAuto" : {
      				"groupBy" : "$_id",
      				"buckets" : 2,
      				"output" : {
      					"count" : {
      						"$sum" : {
      							"$const" : 1
      						}
      					}
      				}
      			}
      		}
      	],
      	"ok" : 1,
      	"operationTime" : Timestamp(0, 0)
      }
      

      This optimization is valid if the Pipeline either has no dependencies, or if the only dependency is the _id. To check this, we'll need to move this dependency calculation up to before the handling of a $sample stage. If the only dependency is the _id, we'll need to add a PROJECTION stage to transform the index key into a full-blown document.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            charlie.swanson@mongodb.com Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: