[JAVA-4206] Collection method countDocuments is order or magnitude slower that native query Created: 24/Jun/21  Updated: 27/Oct/23  Resolved: 24/Jun/21

Status: Closed
Project: Java Driver
Component/s: Internal
Affects Version/s: 4.2.3
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Tomáš Toka Mrázek Assignee: Ross Lawley
Resolution: Works as Designed Votes: 0
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Azure CosmosDB (Mongo API 3.6)
Java Sync Driver 4.2.3
jdk1.8.0_192


Attachments: PNG File image-2021-06-24-14-39-38-399.png     PNG File image-2021-06-24-14-40-11-470.png    

 Description   

We are currently using Azure CosmosDB with Mongo API 3.6. We have a single collection, where all relevant properties are indexed. The collection currently has something about 28m records.

Here is a count query result directly from Azure CosmosDB shell.

Here is the same query run from Robo3T.

Both results are instant. But when I try to run this via Java Driver, it ends up with exception after 60 seconds.

`com.mongodb.MongoExecutionTimeoutException: Request timed out. Retries due to rate limiting: False.`

Here's the part of the code responsible for calling the count query.

```

public Long count(MongoClient client, String database, String collection, Document findQuery, Boolean stats)

{      MongoDatabase db = client.getDatabase(database);      MongoCollection<TransactionEntity> coll = db.getCollection(collection,            TransactionEntity.class);     Long result = coll.countDocuments(findQuery);     printStats(db, stats);     return result; }

```

I am unfortunately unable to pinpoint, what can cause this. From the various reads it looks like the internal implementation is using aggregate pipelines instead of count command, see CountStrategy.AGGREGATE and CountStrategy.COMMAND.

My guess is that the underlying aggregation pipeline is using some unindexed field (although everything in this query is indexed) or not using indexes at all.



 Comments   
Comment by Tomáš Toka Mrázek [ 24/Jun/21 ]

Thank you again for assistance.

In cose someone stumbles on this issue, I resolved it by simply making my own aggregation pipeline innstead of calling countDocuments.

    public Integer count(MongoClient client, String database, String collection, Document findQuery, Boolean stats) {
        MongoDatabase db = client.getDatabase(database);
        MongoCollection<Document> coll = db.getCollection(collection);
        List<? extends Bson> pipeline = Arrays.asList(Aggregates.match(findQuery), Aggregates.count());
        Integer result = coll.aggregate(pipeline).first().getInteger("count");
        printStats(db, stats);
        return result;
    }

Comment by Ross Lawley [ 24/Jun/21 ]

Hi mongodb@tomastokamrazek.cz,

That is strange behaviour as the $count aggregation stage is the equivalent of:

db.collection.aggregate( [
   { $group: { _id: null, myCount: { $sum: 1 } } },
   { $project: { _id: 0 } }
] )

Making the pipeline be:

pipeline = [{'$match': filter},   { $group: { _id: null, myCount: { $sum: 1 } } },  { $project: { _id: 0 } }];

Which has an extra stage, so I wouldn't expect any positive difference regarding performance there.

As this is looking to not be a driver issue and as the driver follows the specifications, I'm marking this down as "Works as Designed".

It looks to be an internal implementation detail of the alternative implementation that is causing you the issue. Should ever want to enjoy the full MongoDB feature set I can highly recommend our own cloud offering MongoDB Atlas.

All the best,

Ross

Comment by Tomáš Toka Mrázek [ 24/Jun/21 ]

Thank you, the pipeline is very helpful, as I am able to produce the issue outside of Java Driver.

It is indeed a weird behaviour, becuase if I change the query from

{'$group': {'_id': 1, 'n': {'$sum': 1}}}

to

{"$count": "count"}

it works as expected.

In that case I'd like to to change the talk around this issue. Is it possible to use count command instead of aggregation pipeline? I inspected the code, but it looks like it's not possible, is there a reason for this?

Comment by Ross Lawley [ 24/Jun/21 ]

Hi mongodb@tomastokamrazek.cz,

Thanks for your email. Unfortunately, we only support the official MongoDB and can't provide any support for CosmosDB issues.

You're correct in that the Mongo Java driver does do an aggregation to filter and count the documents as per the crud specification.

In your scenario it would be:

pipeline = [{'$match': filter},  {'$group': {'_id': 1, 'n': {'$sum': 1}}}];

So if the filter provided doesn't use an index it would require a full scan.

Ross

Generated at Thu Feb 08 09:01:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.