Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22093

Take advantage of the COUNT_SCAN optimization when a pipeline has no dependencies

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      db.foo.drop()
      for (var i = 0; i < 1000; i++) { db.foo.insert({_id: i}) }
      db.foo.explain().aggregate([
          {$match: {_id: {$gte: 0}}},
          {$group: {_id: null, count: {$sum: 1}}}
      ])
      

      Show
      db.foo.drop() for ( var i = 0; i < 1000; i++) { db.foo.insert({_id: i}) } db.foo.explain().aggregate([ {$match: {_id: {$gte: 0}}}, {$group: {_id: null , count: {$sum: 1}}} ])
    • Sprint:
      Query 10 (02/22/16), Query 11 (03/14/16)

      Description

      Consider the following pipeline:

      db.million.aggregate([
          {$match: {_id: {$gte: 0}}},
          {$group: {_id: null, count: {$sum: 1}}}
      ])
      

      This is effectively a count with the predicate {_id: {$gte: 0}}. If we explain this pipeline, we see the following:

      db.million.explain().aggregate([{$match: {_id: {$gte: 0}}}, {$group: {_id: null, count: {$sum: 1}}}])
      {
          "waitedMS" : NumberLong(0),
          "stages" : [
              {
                  "$cursor" : {
                      "query" : {
                          "_id" : {
                              "$gte" : 0
                          }
                      },
                      "fields" : {
                          "_id" : 0,
                          "$noFieldsNeeded" : 1
                      },
                      "queryPlanner" : {
                          "plannerVersion" : 1,
                          "namespace" : "test.million",
                          "indexFilterSet" : false,
                          "parsedQuery" : {
                              "_id" : {
                                  "$gte" : 0
                              }
                          },
                          "winningPlan" : {
                              "stage" : "FETCH",
                              "inputStage" : {
                                  "stage" : "IXSCAN",
                                  "keyPattern" : {
                                      "_id" : 1
                                  },
                                  "indexName" : "_id_",
                                  "isMultiKey" : false,
                                  "isUnique" : true,
                                  "isSparse" : false,
                                  "isPartial" : false,
                                  "indexVersion" : 1,
                                  "direction" : "forward",
                                  "indexBounds" : {
                                      "_id" : [
                                          "[0.0, inf.0]"
                                      ]
                                  }
                              }
                          },
                          "rejectedPlans" : [ ]
                      }
                  }
              },
              {
                  "$group" : {
                      "_id" : {
                          "$const" : null
                      },
                      "count" : {
                          "$sum" : {
                              "$const" : 1
                          }
                      }
                  }
              }
          ],
          "ok" : 1
      }
      

      Notice in particular that the query planner chooses a plan with a fetch stage on top of an index scan, and that the projection being used is

      "fields" : {
          "_id" : 0,
          "$noFieldsNeeded" : 1
      }
      

      I believe the $noFieldsNeeded is intended to tell the query planner that it can do a fast count, but it does not have that effect. The $noFieldsNeeded was introduced in d0037946dc103ffa648f7e8937f2c55351b03c53, but there appear to be no other references to it, during that commit or on master.

      There are a couple things we could do about this

      • Extend the aggregation pipeline to recognize that no fields are needed, and to use the fast count path (used by the count command today) instead of the regular find path
      • Extend the query planner to recognize $noFieldsNeeded, and do something appropriate with it.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              benjamin.murphy Benjamin Murphy
              Reporter:
              charlie.swanson Charlie Swanson
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: