[SERVER-22093] Take advantage of the COUNT_SCAN optimization when a pipeline has no dependencies Created: 07/Jan/16  Updated: 21/Nov/16  Resolved: 04/Mar/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 3.3.3

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Benjamin Murphy
Resolution: Done Votes: 0
Labels: optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22743 Provide fast (estimated) count command Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

db.foo.drop()
for (var i = 0; i < 1000; i++) { db.foo.insert({_id: i}) }
db.foo.explain().aggregate([
    {$match: {_id: {$gte: 0}}},
    {$group: {_id: null, count: {$sum: 1}}}
])

Sprint: Query 10 (02/22/16), Query 11 (03/14/16)
Participants:

 Description   

Consider the following pipeline:

db.million.aggregate([
    {$match: {_id: {$gte: 0}}},
    {$group: {_id: null, count: {$sum: 1}}}
])

This is effectively a count with the predicate {_id: {$gte: 0}}. If we explain this pipeline, we see the following:

db.million.explain().aggregate([{$match: {_id: {$gte: 0}}}, {$group: {_id: null, count: {$sum: 1}}}])
{
    "waitedMS" : NumberLong(0),
    "stages" : [
        {
            "$cursor" : {
                "query" : {
                    "_id" : {
                        "$gte" : 0
                    }
                },
                "fields" : {
                    "_id" : 0,
                    "$noFieldsNeeded" : 1
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "test.million",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "_id" : {
                            "$gte" : 0
                        }
                    },
                    "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "_id" : 1
                            },
                            "indexName" : "_id_",
                            "isMultiKey" : false,
                            "isUnique" : true,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "_id" : [
                                    "[0.0, inf.0]"
                                ]
                            }
                        }
                    },
                    "rejectedPlans" : [ ]
                }
            }
        },
        {
            "$group" : {
                "_id" : {
                    "$const" : null
                },
                "count" : {
                    "$sum" : {
                        "$const" : 1
                    }
                }
            }
        }
    ],
    "ok" : 1
}

Notice in particular that the query planner chooses a plan with a fetch stage on top of an index scan, and that the projection being used is

"fields" : {
    "_id" : 0,
    "$noFieldsNeeded" : 1
}

I believe the $noFieldsNeeded is intended to tell the query planner that it can do a fast count, but it does not have that effect. The $noFieldsNeeded was introduced in d0037946dc103ffa648f7e8937f2c55351b03c53, but there appear to be no other references to it, during that commit or on master.

There are a couple things we could do about this

  • Extend the aggregation pipeline to recognize that no fields are needed, and to use the fast count path (used by the count command today) instead of the regular find path
  • Extend the query planner to recognize $noFieldsNeeded, and do something appropriate with it.


 Comments   
Comment by Githook User [ 04/Mar/16 ]

Author:

{u'username': u'benjaminmurphy', u'name': u'Benjamin Murphy', u'email': u'benjamin_murphy@me.com'}

Message: SERVER-22093 Aggregation uses a COUNT plan when no fields are needed from input documents.
Branch: master
https://github.com/mongodb/mongo/commit/aee9f7e2a93d89ccbca459993565b182d5296dfa

Comment by David Storch [ 29/Jan/16 ]

One way to address this would be to allow the use of COUNT_SCAN plans in aggregations. Agg could identify when it is logically doing a count, and pass the PRIVATE_IS_COUNT planning parameter down to the query engine in these cases.

Comment by Mathias Stearn [ 08/Jan/16 ]

Actually, the reason we did this has to do with the semantics of the projection language. {_id: 0} means include all fields except for _id. {_id: 0, someField: 1} means only include someField (_id: 0 is needed because _id is implicitly included as well unless explicitly excluded). We use $noFieldsNeeded as the "someField" for two reasons: 1) it is (mostly) self-describing in an an explain 2) it is an illegal field in user object so it is unlikely to exist in the source document so we should get out an empty document.

Really, it would be ideal if the projection language had a way to express that no fields are needed, but there isn't a simple way to do that at the moment.

Generated at Thu Feb 08 03:59:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.