[CSHARP-3680] Migrating obsolete "Count" to new methods has a problematic cost Created: 20/May/21  Updated: 27/Oct/23  Resolved: 08/Jun/21

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Unknown
Reporter: Jochem Bonarius Assignee: James Kovacs
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by SERVER-57518 16% performance loss switching from C... Backlog
Related
related to NODE-1638 countDocuments is slower then count Closed

 Description   

I'm tasked with cleaning up the warnings for our C# Framework 4.8 project and I'm running into difficulties with resolving obsolete warnings for the Mongo Driver.

The problem is "Count is obsolete. Use CountDocuments" instead. But that's problematic as `CountDocuments` is slower (see linked Issue NODE-1638). That's not an option for us, as we're already having into performance issues. Thus, like the answer to that issue suggests, we would prefer to use `EstimatedDocumentCount`. But that doesn't seem to be a drop-in replacement! `EstimatedDocumentCount` doesn't take a `filter` argument.

Second problem is that `IFindFluent.Count` is also showing the obsolete warning. But IFindFluent doesn't even have a `EstimatedDocumentCount` alternative.

So how do I solve this? We want to update, but don't want a performance impact. Our customers are already complaining.

(And I'm not even talking about the "when migrating from Count to CountDocuments the following query operations must be replaced" part. Breaking changes are high risk for business logic.)



 Comments   
Comment by James Kovacs [ 08/Jun/21 ]

The performance regression between Count and CountDocuments originates from the server and the driver does not introduce any additional latency.

Comment by James Kovacs [ 08/Jun/21 ]

Hi, jochem@netwinst.nl,

Thank you for opening SERVER-57518.

Regarding when Count will be removed from the driver. We follow Semantic Versioning, which means that we will only make API-breaking changes in major releases. Thus any deprecated methods won't be removed until the 3.0 release of the .NET/C# driver. (We are currently preparing for the 2.13.0 release.) Even when the .NET/C# driver removes the Count method, you can still call it via the RunCommand technique mentioned above until such time as the server removes support for it.

I'll close this ticket and let you work with the Server team to discuss the performance regression in SERVER-57518. Thank you for reporting this issue.

Sincerely,
James

Comment by Jochem Bonarius [ 08/Jun/21 ]

Dear @JamesKovacs

Thank you for your time and extensive support. I've opened SERVER-57518. I hope they can help me
Will "Count" be removed from the driver in the future?

Best regards,
Jochem

Comment by James Kovacs [ 07/Jun/21 ]

Hi, jochem@netwinst.nl,

Thank you for running the requested shell commands to confirm that the difference in execution time that you are seeing is due to server behaviour and not the driver. I recommend opening a SERVER ticket and referencing this one for context. You'll get better notifications and clearer communication as the original reporter of the issue if you open it yourself than if I do it on your behalf.

Regarding your follow-on question about server support for EstimatedDocumentCount and CountDocuments. When you call EstimatedDocumentCount, the driver or shell sends the count command without a predicate, which uses the collection metadata to estimate the number of documents in the collection. count has been around since prior to MongoDB 1.0. When you call CountDocuments, the driver or shell sends a $group/$sum aggregation pipeline to MongoDB and the aggregation pipeline has been supported since MongoDB 2.2. (MongoDB 2.2 reached end-of-life in 2014.) Thus any currently supported version of MongoDB supports the commands underlying EstimatedDocumentCount and CountDocuments.

Please let us know if you have any additional questions or concerns.

Sincerely,
James

Comment by Jochem Bonarius [ 04/Jun/21 ]

Dear @James Kovacs @james.kovacs ,

Yes, this works. Thank you.

The results of 20 iters are (which seem quite consistent)

As you indicated this must mean it's a server thing.

 

edit: thinking about this some more, simply stating that this is "a server issue" is maybe oversimplifying things.
I get the arguments for deprecating Count, but if there's no good alternative for all server versions, maybe it should not be done.

Comment by James Kovacs [ 03/Jun/21 ]

Hi, Jochem,

Thank you for trying to run those commands. The mongo shell can be used to connect to a remote MongoDB cluster just as you would use Compass, MongoSH, or Robot3T. I wasn't suggesting that you log onto the mongod server itself to run the mongo shell. I know you've got limited ability to access the cluster and cannot update its software.

CountDocuments isn't actually a MongoDB command but a helper method implemented by the drivers and shell. It appears that your version of Robo3T doesn't implement this helper. What is actually sent to the server is a $group/$sum operation. The following is the $group/$sum that would be generated so you can run the test in Robo3T:

function stopwatch(func) {
    var iters = 10;
    var start = new Date();
    for(var i=0; i<iters; i++) {
        func();
    }
    var end = new Date();
    print((end-start)/iters + " ms");
}
stopwatch(() => db.largeCollection.count({Om:0}));
stopwatch(() => db.largeCollection.aggregate([{$match: {Om:0}}, {$group: {_id: null, count: {$sum:1}}}]))

We look forward to the results from this test.

Sincerely,
James

Comment by Jochem Bonarius [ 03/Jun/21 ]

Dear @James Kovacs @james.kovacs ,

Like I stated in earlier comments: I have no access to the Mongo server itself, as they are managed externally. I can only connect through things like MongoDB Compass and Robo 3T on a remote server

Using MongoDB Compass 1.22.1 build in _MongoSH Beta
When I run the "stopwatch(() =>...)" you suggest, I get a an error:
MongoshInvalidInputError: Cannot pass a function that calls a Mongosh API method as an argument

I tried more, but thing like "print(...)" return undefined
and "new Date()" returns {}

When I try to run the commands through Robo 3T 1.2.1
stopwatch(() => db.Profiles.count({Om:0})); works: 448.7 ms
stopwatch(() => db.Profiles.countDocuments({Om:0})); fails:
TypeError: db.Profiles.countDocuments is not a function

I will say in advance: I cannot update these versions of the software. These are also managed externally.

Comment by James Kovacs [ 02/Jun/21 ]

Hi, Jochem,

Thank you for expressing your concerns about the performance difference between Count(predicate) and CountDocuments(predicate). We understand that a 34% performance degradation is significant and concerning.

We appreciate you taking the time to explain your use case a bit more as it is clear now that EstimatedDocumentCount will not work for your situation. You need to supply a predicate, which will entail performing a count operation (e.g. Count) or $group/$sum aggregation (e.g. CountDocuments) on the server.

Based on our repro, the performance difference between Count(predicate) and CountDocuments(predicate) is due to the query time on the server and not due to any additional time in the driver. We would like to confirm that this is also the case with your use case using your data set and indexes. To verify that the observed performance difference is due to query time on the server, please run the following commands in the mongo shell (thus eliminating the .NET/C# driver as a contributing factor). Note that the stopwatch helper function takes the average time of 10 runs. If the 10-run average doesn't report consistent timings, you can increase this to 100 to improve reproducibility.

function stopwatch(func) {
    var iters = 10;
    var start = new Date();
    for(var i=0; i<iters; i++) {
        func();
    }
    var end = new Date();
    print((end-start)/iters + " ms");
}
stopwatch(() => db.largeCollection.count({Om:0}));
stopwatch(() => db.largeCollection.countDocuments({Om:0}))

Please provide the output of these timings so that we can determine next steps. If the timings show that the performance difference is due to server query time, we can open a SERVER ticket to investigate further. If the timings show similar query performance for count and countDocuments on the server, then we can dig further into the .NET/C# driver - possibly comparing it to pymongo or another driver - to understand the performance differences between Count and CountDocuments.

We appreciate your time and patience as we continue to investigate this issue together.

Sincerely,
James

Comment by Jochem Bonarius [ 02/Jun/21 ]

Dear @James Kovacs @james.kovacs ,

> To solve these problems, we split count into two distinct operations: EstimatedDocumentCount and CountDocuments...

Yes, I understand. This has been stated a number of times by different people, but it doesn't seem to be as simple as that. The underlying function has changed (seen in the performance difference), as well as the interface (See remarks: https://mongodb.github.io/mongo-csharp-driver/2.8/apidocs/html/M_MongoDB_Driver_IMongoCollection_1_CountDocuments.htm). Thus, it's no drop-in replacement.

> Hopefully that explanation helps you understand why CountDocuments can be much slower than count especially for large collections.

Well, no, especially when combined with

> Note that the EstimatedDocumentCount and CountDocuments methods in the .NET/C# driver are very thin wrappers around the underlying metadata count and $group/$sum operations executed on the server.

Becasue in the code I got from Mikalai, it shows that

database.RunCommand<BsonDocument>(new BsonDocument("count", <Collection name>).Add("query", filter));

is about as fast as Count(), while CountDocuments() is 34% slower. That would seem more is happening in CountDocuments which slows down the evaluation.

> Hopefully this information assists you in tuning your count operations.

No, I don't know how to properly solve this. Losing ~150ms in a query is significant in an applications with many transactions. And we have a predicate, so neither EstimatedDocumentCount not CountDocuments seems to be a "good" option,
My first instinct is just to keep using Count(). And if it is removed from the driver in the future, to use Mikalai's raw command.
However, I don't know how to solve this for the IFindFluent interface....

To me, it is bizarre that a performance loss of 34% is considered an acceptable solution for any application, even considering the problems with the previous version. How can you sell this to customers? I mean, our customers expect a performance >improvement< with every new version. If we would release with a 34% performance degradation, our customer support line would be overloaded with complaints.

Comment by James Kovacs [ 31/May/21 ]

Hi, Jochem,

Thank you for your patience in explaining the problem that you've encountered and providing the requested data on documents counts and results. I would like to take a step back to discuss why count was deprecated in the first place as it will provide context as well as a path forward.

The problem with count is that it has historically done two very different things with very different performance profiles. count without a predicate would query the collection metadata, which is a very fast operation. count with a predicate has to query the collection's documents, hopefully supported by an index, to determine the document count. This could be a very time-consuming, I/O intensive operation that might require a full collection scan if a supporting index isn't available. Another problem is sharded collections where count without a predicate will simply sum up the metadata counts on each shard - which could over-count documents if there were orphaned documents or in-flight migrations.

To solve these problems, we split count into two distinct operations: EstimatedDocumentCount and CountDocuments...

EstimatedDocumentCount simply returns the document count from the collection metadata. This is why you cannot supply a predicate to EstimatedDocumentCount - because it is simply returning the value in the collection metadata. It is guaranteed to be fast because it never has to touch documents or indexes. The downside is that it is estimated and can over-count documents in sharded collections due to orphans or migrations.

CountDocuments is implemented using the Aggregation pipeline, specifically $group/$sum. It will give an accurate count of documents in a collection based on a predicate. Even if no predicate is supplied, it will execute the $group/$sum aggregation with an empty filter to obtain the true count of documents in the collection correctly filtering out orphans in sharded collections. While it provides a more accurate count, it does require more work by the server to return the result.

If you only need an approximate count of documents in a collection, then EstimatedDocumentCount is the method you should use. It is going to be much faster because it is only examining collection metadata but be aware that it can over-count orphans. If you need an accurate document count, then CountDocuments is the method you should use and you should ensure that you have an appropriate index present to support the supplied predicate.

Hopefully that explanation helps you understand why CountDocuments can be much slower than count especially for large collections. When migrating away from the deprecated count operation, the question really becomes how accurate do you require your counts? Can you code tolerate over-counting orphans? If so, then EstimatedDocumentCount will likely be sufficient for your needs. If you need to supply a predicate or cannot tolerate over-counting orphans, then you should use CountDocuments and ensure that you have an index to support your predicate.

In summary count has been split into two operations: EstimatedDocumentCount and CountDocuments. Rather than implicitly switching between reading from metadata and actually counting the documents (using an index or full collection scan) as we did with count, implementers can now chose to read the count from metadata via EstimatedDocumentCount or accurately counting the documents in the collection using CountDocuments.

Note that the EstimatedDocumentCount and CountDocuments methods in the .NET/C# driver are very thin wrappers around the underlying metadata count and $group/$sum operations executed on the server. Thus the majority of the performance difference will be due to server execution and not the .NET/C# implementation in the driver. Hopefully this information assists you in tuning your count operations.

Please let us know if you have any additional questions.

Sincerely,
James

Comment by Jochem Bonarius [ 27/May/21 ]

@Mikalai Mazurenka / @mikalai.mazurenka,

I modified your code to run the original query on mt production database (I guess it does the same, didn't report the result count)

The numbers I see are:

Average results from 300 runs 

  • Count 426
  • CountDocuments 571
  • RunCommand 429

So CountDocuments is taking 34% more time! I consider that quite significant. And thus a big problem

i.e. CountDocuments is not a drop-in replacement for Count...

Comment by Jochem Bonarius [ 27/May/21 ]

@Mikalai Mazurenka / @mikalai.mazurenka,

I've modified your test to match our corporate password protected server.

Results

  • Count 424
  • CountDocuments 499
  • RunCommand 425

That's all good and well, but doesn't represent my results. I will update your search to match our database documents and a realistic search.

The DB is 3.4.9

The Driver is 2.10.3

What do you mean with "Configuration"?

(Note: server is corporate and app is also very big and maintained by a number of people, so "update your version" is not achievable by me)

Comment by Jochem Bonarius [ 27/May/21 ]

@Jeffrey Yemin / @jeff.yemin,

The numbers are in the earlier comments. Total documents 2'721'199, query returns 2'060'277.

Comment by Mikalai Mazurenka (Inactive) [ 25/May/21 ]

Hi jochem@netwinst.nl

Could you please use the provided example to check your execution results: https://gist.github.com/MikalaiMazurenka/10c7c092c4773ae10ea76f67f53f5fb0

Note, that for cleaner results I added a time period for driver to set up and calculate average from multiple runs on large dataset, also have the RunCommand present as a "control group".

I see around 5% difference, but that may depend on server version, configuration and less on secondary parameters like OS, compression, encryption, authentication, etc.

So could you please provide your results along with server version, driver version and configuration?

Comment by Jeffrey Yemin [ 25/May/21 ]

jochem@netwinst.nl sorry, one more question. What does count actually return (i.e. how many documents match the query filter)?

Comment by Jochem Bonarius [ 25/May/21 ]

Hi @Jeff Yemin and @Mikalai Mazurenka .... @jeff.yemin and @mikalai.mazurenka , (<- I don;t get how user tagging works...)

  • Total number of documents: (For my example: differs between collections and customers) 2'721'199 documents
  • Query filter: Filter.Eq("Om", 0)
  • All of the indexes on this collection: { "Pr.P2" : 1, "CD" : 1 } { "Pr.P2" : 1 } { "Pr.P7" : 1 } { "HI.B" : 1 } { "HI.C" : 1 } { "PI.RI" : 1 } { "Om" : 1 } { "Pr.P100" : 1 } { "Pr.P84" : 1 } { "Pr.P2" : 1, "Pr.P8" : 1 } { "PI.I" : 1 }
  • Deployement: it's a single server. So no scaling. (I think it defaults to sharding?)
Comment by Jeffrey Yemin [ 23/May/21 ]

Hi jochem@netwinst.nl

Can you let us know:

  • total number of documents in collection
  • the query filter you're using
  • all the indexes on the collection
  • deployment (replica set or sharded)
Comment by Jochem Bonarius [ 23/May/21 ]

Did more tests, based on this Stack Overflow item. Added a filter and got differen results:

  • Count returns 2060277 and took ~434 ms.
  • CountDocuments returns 2060277 and took ~575 ms.

Although much better, it's still >25% slower. That's significant and we would rather avoid that, as perfomance is already an issue. How can I achieve that?

Comment by Jochem Bonarius [ 23/May/21 ]

I did some tests myself and found the following for a small collection,which already shows the differences:

  • Count returns 6600 and took 12 ms.
  • CountDocuments returns 6600 and took 25 ms.
  • EstimatedDocumentCount returns 6600 and took 2 ms.

But then for a big collection:

  • Count returns 2721199 and took 12 ms.
  • CountDocuments returns 2721199 and took 196406 ms.
  • EstimatedDocumentCount returns 2721199 and took 4 ms.

That's 196 seconds! 16367x slower! Totally unacceptable...

What are we supposed to do this way?

 

Comment by Kaitlin Mahar [ 20/May/21 ]

Hi jochem@netwinst.nl, Thank you for reaching out. I have moved your ticket to the CSHARP project for their team to triage, as this appears to be a question about the C# driver rather than the Node.js driver.

Generated at Wed Feb 07 21:45:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.