Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Component/s: None
Labels:
None

The estimatedDocumentCount helper is not easily discoverable by developers migrating from the count helper. For example, estimatedDocumentCount doesn't autocomplete in IDEs when searching for helpers related to count. This is a problem since applications that only need an estimated count can take a large performance hit when using countDocuments for that purpose.

Drivers MUST add text to the documentation for countDocuments directing these users to estimatedDocumentCount. Suggested text:

"For a fast count of the total documents in a collection see estimatedDocumentCount" with a link to the documentation for that helper.

Original report below
----------------------------

It was suggested I file a DRIVERS with some feedback based on a pair of observations converting applications from mgo to mongo-go-driver, although what follows is perhaps general to ~~DRIVERS-501~~.

In two independent instances, application developers where looking to upgrade to drivers which included the new Count APIs from ~~DRIVERS-501~~. Developers matched up existing Count() with the new API and, probably via auto-complete on the driver API, intuitively selected CountDocuments() in the case of mongo-go-driver and presumed was equivalent to what the past driver (mgo) was doing.

I'm guessing that the developer intuitively picking CountDocuments() is maybe viewed as a "success" from aim of ~~DRIVERS-501~~. But, in both of these independent cases it had severe unexpected consequences. These were Count()'s with no query predicates. So, internally to mongod, we were getting the "fast path" constant time document count from collection metadata (and were fine with the inaccuracy). When switched to CountDocuments() and the agg, we were now getting a scan instead.

In particular, one occurrence was against the oplog collection: the worst case. While other user collections on modern mongod versions would at least complete the agg with only an _id index scan, on the oplog it's a full document scan. I understand the scan's count is accurate now.. but IMO don't think is ever what a user would want/expect coming from constant time performance before.

I'm unclear exactly how best to head off future cases. Maybe some API documentation on CountDocuments() that prominently calls out that it's not constant time like EstimatedCount()? Unclear if this would've helped or not. But the two occurrences of falling into the same trap makes me wonder if it could be a common mishap for users.

depends on

CDRIVER-3146 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

CSHARP-2619 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

CXX-1774 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

JAVA-3299 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

JAVA-3538 Documentation for Reactive Streams countDocuments MUST mention estimatedDocumentCount

Closed

MOTOR-347 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

NODE-1981 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

PHPLIB-434 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

PYTHON-1847 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

GODRIVER-1083 Documentation for countDocuments MUST mention estimatedDocumentCount

Closed

MONGOID-4958 Link countDocuments <-> estimatedDocumentCount documentation

Closed

RUBY-1817 Link countDocuments <-> estimatedDocumentCount documentation

Closed

(7 depends on)

Assignee:: Unassigned
Reporter:: John Morales (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Apr 10 2019 07:49:45 PM UTC
Updated:: Feb 22 2023 03:01:24 AM UTC
Resolved:: Feb 22 2023 03:01:24 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates