[DRIVERS-630] Documentation for countDocuments MUST mention estimatedDocumentCount Created: 10/Apr/19  Updated: 22/Feb/23  Resolved: 22/Feb/23

Status: Closed
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: John Morales Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on CDRIVER-3146 Documentation for countDocuments MUST... Closed
depends on CSHARP-2619 Documentation for countDocuments MUST... Closed
depends on CXX-1774 Documentation for countDocuments MUST... Closed
depends on JAVA-3299 Documentation for countDocuments MUST... Closed
depends on JAVA-3538 Documentation for Reactive Streams co... Closed
depends on MOTOR-347 Documentation for countDocuments MUST... Closed
depends on NODE-1981 Documentation for countDocuments MUST... Closed
depends on PHPLIB-434 Documentation for countDocuments MUST... Closed
depends on PYTHON-1847 Documentation for countDocuments MUST... Closed
depends on GODRIVER-1083 Documentation for countDocuments MUST... Closed
depends on MONGOID-4958 Link countDocuments <-> estimatedDocu... Closed
depends on RUBY-1817 Link countDocuments <-> estimatedDocu... Closed
Driver Compliance:
Key Status/Resolution FixVersion
PERL-1101 Fixed 2.1.1
GODRIVER-1083 Fixed 1.1.0
CXX-1774 Fixed 3.7.0, 3.7.0-beta1
JAVA-3299 Fixed 3.11.0
MOTOR-347 Fixed 2.1
PHPLIB-434 Fixed 1.7.0-beta1, 1.7.0
CDRIVER-3146 Works as Designed
NODE-1981 Fixed next
SCALA-529 Fixed 2.7.0
JAVA-3538 Fixed 4.0.0
RUBY-1817 Fixed 2.13.1
PYTHON-1847 Fixed 3.9
CSHARP-2619 Fixed 2.10.0
MONGOID-4958 Duplicate

 Description   

The estimatedDocumentCount helper is not easily discoverable by developers migrating from the count helper. For example, estimatedDocumentCount doesn't autocomplete in IDEs when searching for helpers related to count. This is a problem since applications that only need an estimated count can take a large performance hit when using countDocuments for that purpose.

Drivers MUST add text to the documentation for countDocuments directing these users to estimatedDocumentCount. Suggested text:

"For a fast count of the total documents in a collection see estimatedDocumentCount" with a link to the documentation for that helper.

Original report below
----------------------------

It was suggested I file a DRIVERS with some feedback based on a pair of observations converting applications from mgo to mongo-go-driver, although what follows is perhaps general to DRIVERS-501.

In two independent instances, application developers where looking to upgrade to drivers which included the new Count APIs from DRIVERS-501. Developers matched up existing Count() with the new API and, probably via auto-complete on the driver API, intuitively selected CountDocuments() in the case of mongo-go-driver and presumed was equivalent to what the past driver (mgo) was doing.

I'm guessing that the developer intuitively picking CountDocuments() is maybe viewed as a "success" from aim of DRIVERS-501. But, in both of these independent cases it had severe unexpected consequences. These were Count()'s with no query predicates. So, internally to mongod, we were getting the "fast path" constant time document count from collection metadata (and were fine with the inaccuracy). When switched to CountDocuments() and the agg, we were now getting a scan instead.

In particular, one occurrence was against the oplog collection: the worst case. While other user collections on modern mongod versions would at least complete the agg with only an _id index scan, on the oplog it's a full document scan. I understand the scan's count is accurate now.. but IMO don't think is ever what a user would want/expect coming from constant time performance before.

I'm unclear exactly how best to head off future cases. Maybe some API documentation on CountDocuments() that prominently calls out that it's not constant time like EstimatedCount()? Unclear if this would've helped or not. But the two occurrences of falling into the same trap makes me wonder if it could be a common mishap for users.



 Comments   
Comment by João Ferreira [ 23/Apr/19 ]

My experience was also similar, I migrated the old `count(filter)` to `countDocuments(filter)` without noticing the note to migrate some query operators. Changing all the filters was not so easy to keep using the old deprecated count, but I wouldnt mind if there was a non deprecated way to call `estimateDocumentCount(filter)` (i.e with an old filter argument...)

Comment by John Morales [ 12/Apr/19 ]

Fantastic, yep I agree and think referencing estimatedDocumentCount from documentCount is best option, too. Thank you!

Comment by Bernie Hackett [ 10/Apr/19 ]

Oh, another important tidbit, the count command doesn't work in a transaction (that's what caused all this in the first place). If you want to do a count in a transaction you have no choice but to use countDocuments.

Comment by Bernie Hackett [ 10/Apr/19 ]

How about we add text to the docstring in each driver, something like:

"For a fast count of the total documents in a collection see estimatedDocumentCount" with a link to the docs for that method / function.

Comment by Bernie Hackett [ 10/Apr/19 ]

That's a great insight about auto-completion. We debated the names a lot. countAccurate and countEstimate were considered and rejected because using the word "accurate" could be misleading on older versions of the server drivers are still required to support. The "Client Side Count Improvements" scope document has a lot more context in it (especially the comment chains). The naming decision was finalized in a review meeting with the entire management chain.

edit - I recall that we picked the name estimatedDocumentCount because that's what it is (no work is done on the server side to calculate the number, it just returns the number from pre-calculated metadata), as opposed to countDocuments which is what that does on the server side.

At a minimum better docs are in order, especially in Go. Compare Go to Python for example. I don't think any driver calls out using estimatedDocumentCount in the docstring for countDocuments. Doing that would probably help a lot.

Comment by John Morales [ 10/Apr/19 ]

Yeah I figured that – I mentioned the auto-complete as I think in our cases anyway, developers didn't know the EstimatedCount variant was even a thing.

Regarding a performance warning in all drivers, I'm honestly not sure! Maybe? I don't mean to really prescribe anything.. figured I'd start by sharing and starting a conversation. Perhaps other/better options than an API doc change might come out, too. I think awareness of the difference for users already familiar with the past APIs, and awareness of EstimatedCount, are probably the main goals of any ideas.

Comment by Bernie Hackett [ 10/Apr/19 ]

The names were chosen in an attempt to make this clear. countDocuments literally counts the documents. Would you prefer that we add a warning about performance to the docs in all drivers?

Generated at Thu Feb 08 08:21:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.