[DRIVERS-630] Documentation for countDocuments MUST mention estimatedDocumentCount Created: 10/Apr/19 Updated: 22/Feb/23 Resolved: 22/Feb/23 |
|
| Status: | Closed |
| Project: | Drivers |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | John Morales | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The estimatedDocumentCount helper is not easily discoverable by developers migrating from the count helper. For example, estimatedDocumentCount doesn't autocomplete in IDEs when searching for helpers related to count. This is a problem since applications that only need an estimated count can take a large performance hit when using countDocuments for that purpose. Drivers MUST add text to the documentation for countDocuments directing these users to estimatedDocumentCount. Suggested text: "For a fast count of the total documents in a collection see estimatedDocumentCount" with a link to the documentation for that helper. Original report below It was suggested I file a DRIVERS with some feedback based on a pair of observations converting applications from mgo to mongo-go-driver, although what follows is perhaps general to In two independent instances, application developers where looking to upgrade to drivers which included the new Count APIs from I'm guessing that the developer intuitively picking CountDocuments() is maybe viewed as a "success" from aim of In particular, one occurrence was against the oplog collection: the worst case. While other user collections on modern mongod versions would at least complete the agg with only an _id index scan, on the oplog it's a full document scan. I understand the scan's count is accurate now.. but IMO don't think is ever what a user would want/expect coming from constant time performance before. I'm unclear exactly how best to head off future cases. Maybe some API documentation on CountDocuments() that prominently calls out that it's not constant time like EstimatedCount()? Unclear if this would've helped or not. But the two occurrences of falling into the same trap makes me wonder if it could be a common mishap for users. |
| Comments |
| Comment by João Ferreira [ 23/Apr/19 ] |
|
My experience was also similar, I migrated the old `count(filter)` to `countDocuments(filter)` without noticing the note to migrate some query operators. Changing all the filters was not so easy to keep using the old deprecated count, but I wouldnt mind if there was a non deprecated way to call `estimateDocumentCount(filter)` (i.e with an old filter argument...) |
| Comment by John Morales [ 12/Apr/19 ] |
|
Fantastic, yep I agree and think referencing estimatedDocumentCount from documentCount is best option, too. Thank you! |
| Comment by Bernie Hackett [ 10/Apr/19 ] |
|
Oh, another important tidbit, the count command doesn't work in a transaction (that's what caused all this in the first place). If you want to do a count in a transaction you have no choice but to use countDocuments. |
| Comment by Bernie Hackett [ 10/Apr/19 ] |
|
How about we add text to the docstring in each driver, something like: "For a fast count of the total documents in a collection see estimatedDocumentCount" with a link to the docs for that method / function. |
| Comment by Bernie Hackett [ 10/Apr/19 ] |
|
That's a great insight about auto-completion. We debated the names a lot. countAccurate and countEstimate were considered and rejected because using the word "accurate" could be misleading on older versions of the server drivers are still required to support. The "Client Side Count Improvements" scope document has a lot more context in it (especially the comment chains). The naming decision was finalized in a review meeting with the entire management chain. edit - I recall that we picked the name estimatedDocumentCount because that's what it is (no work is done on the server side to calculate the number, it just returns the number from pre-calculated metadata), as opposed to countDocuments which is what that does on the server side. At a minimum better docs are in order, especially in Go. Compare Go to Python for example. I don't think any driver calls out using estimatedDocumentCount in the docstring for countDocuments. Doing that would probably help a lot. |
| Comment by John Morales [ 10/Apr/19 ] |
|
Yeah I figured that – I mentioned the auto-complete as I think in our cases anyway, developers didn't know the EstimatedCount variant was even a thing. Regarding a performance warning in all drivers, I'm honestly not sure! Maybe? I don't mean to really prescribe anything.. figured I'd start by sharing and starting a conversation. Perhaps other/better options than an API doc change might come out, too. I think awareness of the difference for users already familiar with the past APIs, and awareness of EstimatedCount, are probably the main goals of any ideas. |
| Comment by Bernie Hackett [ 10/Apr/19 ] |
|
The names were chosen in an attempt to make this clear. countDocuments literally counts the documents. Would you prefer that we add a warning about performance to the docs in all drivers? |