[DRIVERS-2518] Add rationale for why the filter parameter is required for countDocuments Created: 14/Dec/22  Updated: 26/Jun/23

Status: Backlog
Project: Drivers
Component/s: CRUD
Fix Version/s: None

Type: Task Priority: Unknown
Reporter: Shane Harvey Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to DRIVERS-501 Implement new count API Closed
Driver Changes: Not Needed

 Description   

When the countDocuments API was added I believe the intention was to make the filter parameter required, that is coll.countDocuments() is not allowed, it has to be coll.countDocuments({}). However I can't find any such rationale in the spec itself. We should add rationale for why the filter parameter is required for countDocuments and perhaps even add a spec test that calling countDocuments without a filter will raise an error.



 Comments   
Comment by Shane Harvey [ 19/Dec/22 ]

Thanks for the deep dive jmikola@mongodb.com. In that case let's not add a test for this behavior since it would be a breaking change in some drivers. Let's just make the rationale clear in the spec and let existing drivers stay as they are.

Comment by Jeremy Mikola [ 15/Dec/22 ]

Writing this was a fun trip down memory lane. I'll address the comments above in the order they were written.

Historical evidence for filter being a required parameter for all CRUD APIs

AFAICT the spec doesn't say that find requires a filter, python supports coll.find() for example as do other drivers.

I was basing that on the method declaration:

find(filter: Document, options: Optional<FindOptions>): Iterable<Document>;

I was able to dig up some old comments supporting the historical justification I alluded to above. This comment from SPEC-76 explains how we ended up requiring explicit filter arguments:

All criteria (now filter) will be required in every operation per Eliot.

And this comment from SPEC-86 provides some more context on how to interpret the API declarations:

The only positional arguments allowed are the required arguments. All other arguments must either be named or part of the options hash

This is also why all spec tests for find, count, and distinct operations specify a filter argument, even if it's just an empty document – not just in the CRUD spec but in derivative specs such as retryable reads as well. There is one reference in the CRUD spec where collection.find({}) is called without an argument but it's in the definition for "Iterable" so I wouldn't consider that authoritative (and it was likely just intended to be a concise reference to the operation name).

The filter parameter for count-related APIs

estimateDocumentCount() uses the count command to get the total document count from collection metadata, a fast operation. countDocuments({}) uses aggregation to count the documents, a comparatively slow operation. The point of requiring the empty document was to re-enforce what the method actually does and what it is for, returning the count of a query.

By design, estimateDocumentCount() cannot take a filter parameter because it relies on the count command's behavior to return an estimated count from collection metadata in the absence of a query filter.

If there was an actual rationale for requiring filter on countDocuments() independent of maintaining consistency with existing CRUD APIs, I could find no record of it. There was a patch in the original code review titled "Make it clear that filter is a required argument" but no comments leading up to it. There likewise isn't any mention about this in the scope doc.

In almost all cases someone calling countDocuments with an empty filter would be better served by estimatedDocumentCount.

No argument there.

Having said that...

As Shane pointed out, Python and probably most drivers do not require filter parameters for various CRUD methods. We can say this is because we priortize BC over spec compliance, but I doubt any driver has bothered to change that in a major version bump. Likewise, drivers written after the CRUD spec don't seem to require filter either (e.g. Rust). In fact, Rust's count_documents() method also has an optional filter and it looks like Node does as well. In flagrant disregard for the CRUD spec, I historically allowed optional filters for all read-only CRUD methods in PHPLIB – and didn't consider any reason not to continue doing so when implementing countDocuments().

I won't object if someone wants to add a rationale to the spec, but I don't think it serves much purpose if we're missing rationale for other methods like find() and distinct() – and I'm not sure what else we might say for those at this point other than "This is what Eliot wanted in 2014". And even if we do that, it doesn't change the fact that at least a handful of drivers allow it to be optional anyway.

Comment by Shane Harvey [ 14/Dec/22 ]

AFAICT the spec doesn't say that find requires a filter, python supports coll.find() for example as do other drivers.

Comment by Jeremy Mikola [ 14/Dec/22 ]

I'm not going to jump through the git history but I imagine this is related to find() also requiring a filter document – and the justification for that goes likely back to the original draft of the CRUD spec (so the only rationale might be in an old Google Doc comment thread).

IMO, it only makes sense to do so for write methods where a default filter could make it possible to fat-finger a deleteMany or updateMany. I've never agreed with the find() method requiring a filter parameter and PHP has historically never required it (in the interest of BC).

Generated at Thu Feb 08 08:25:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.