[SERVER-80800] Suggestion to improve CountDocuments performance Created: 06/Sep/23  Updated: 04/Oct/23  Resolved: 04/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ivan Fioravanti Assignee: Noopur Gupta
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by SERVER-57518 16% performance loss switching from C... Backlog
Assigned Teams:
Server Triage
Operating System: ALL
Participants:

 Description   

collection.countDocuments is converted in an aggregate with $math + $group by _id.
 

db.getCollection("test").countDocuments({"type":"test","auditEnabled":{"$ne":true}})

 
becomes:

db.getCollection("test").aggregate(
    [
        {
            "$match" : {
                "type" : "test",
                "auditEnabled" : {
                    "$ne" : true
                }
            }
        },
        {
            "$group" : {
                "_id" : NumberInt(1),
                "n" : {
                    "$sum" : NumberInt(1)
                }
            }
        }
    ],
    {
        "allowDiskUse" : false
    }
);

 

Problem is that if documents in the collection are large, they are fully loaded in memory (apparently) to aggregate them.

Why don't you add a $project by _id after $match? This will greatly improve (10x in our case) performance. Something like:

db.getCollection("test").aggregate(
    [
        {
            "$match" : {
                "type" : "test",
                "auditEnabled" : {
                    "$ne" : true
                }
            }
        },
        {
            "$project" : 
            {
                "_id" : 1
            }
        },
        {
            "$group" : {
                "_id" : NumberInt(1),
                "n" : {
                    "$sum" : NumberInt(1)
                }
            }
        }
    ],
    {
        "allowDiskUse" : false
    }
);

 

 



 Comments   
Comment by Noopur Gupta [ 03/Oct/23 ]

Hi Ivan,

There are 3 different ways to count documents. 

count() , countDocuments(), estimatedCountDocuments()

https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/

https://www.mongodb.com/docs/manual/reference/method/db.collection.count/

https://www.mongodb.com/docs/manual/reference/method/db.collection.estimatedDocumentCount/

 

We have observed differences in the performance between these 3 operations - countDocuments being significantly slower than count and estimatedCountDocuments.

While I appreciate the suggestion on the workaround - to add $project to improve the performance, however this performance issue has already been reported to the Query Optimization team and there are ongoing efforts to make this improvement. Please follow - https://jira.mongodb.org/browse/SERVER-57518 for further details. 

 

Thanks,

Noopur Gupta

Generated at Thu Feb 08 06:44:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.