[CXX-663] DBClientBase::query() returns fewer documents than specified by nToReturn Created: 08/Sep/15  Updated: 07/Oct/15  Resolved: 02/Oct/15

Status: Closed
Project: C++ Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Viacheslav Usov Assignee: Adam Midvidy
Resolution: Done Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

mongo::DBClientBase::query() has the following documentation in the header file where it is declared:

@param nToReturn n to return (i.e., limit). 0 = unlimited

There is no other documentation explaining the true semantics of this parameter; this is the case with much of the C++ driver's (non) documentation and one normally has to look at the much better documentation of the JSON interface and try to map the JSON concepts to the C++ concepts. The JSON analogue of nToReturn parameter seems to be cursor.limit(), documented at http://docs.mongodb.org/manual/reference/method/cursor.limit/#cursor.limit as "analogous to the LIMIT statement in a SQL database".

Unfortunately, nToReturn is not "analogous to the LIMIT statement in a SQL database". The LIMIT statement, also known as TOP, returns ALL the records matching the query criteria, if and only if the number of such records is fewer than the specified limit; otherwise, it returns exactly the limiting number of the queries. So if one gets fewer records than the specified limit, one can be sure there are NO MORE matching records.

The behaviour of on nToReturn is very different. It can be described as "SOME number of records will be returned, but never more than the specified number". So if one gets fewer records than the specified limit, one CANNOT be sure there are NO MORE matching records.

Why does that happen? Because when nToReturn is used, the server will return only ONE batch of data. The batch will contain no more than the specified number of records, AND ALSO NO MORE THAN MaxBytesToReturnToClientAtOnce bytes per batch, which is 4 MiB (as of v3.0). This second condition is not documented anywhere; the only place which tries to say something about these limitations I have been able to find is at https://docs.mongodb.org/manual/core/cursors/#cursor-batches and it says this

"The MongoDB server returns the query results in batches. Batch size will not exceed the maximum BSON document size. "

This is a lie, because the max BSON doc size is 16 MiB, while the server-side limitation is 4 MiB.

Why is that important? Partly because it cost me half a day to figure out why queries from a collection having a few million matching documents, with nToReturn set to 10 thousand, would only return 3-4 thousands. Yes, the document size was 1-2 KiB, running into the undocumented limitation. I can easily imagine that other people might run into this issue.

More important is that the current behaviour of nToReturn makes it pretty much useless. How did I fix the problem eventually? By NOT using nToReturn and implementing the constraint on top of the driver, which is quite a bit less efficient.

I am not sure what this problem really is: bad documentation, bad driver code, or bad server code, or bad design overall. But the whole point is that people really DO expect in such APIs that if the number of returned records is fewer than the limit, then there is no more matching records. Having it done differently is a very very bad surprise, and having it differently in an UNKNOWN way is much worse.



 Comments   
Comment by Adam Midvidy [ 07/Oct/15 ]

Hello Viacheslav, it looks like this driver behavior is actually erroneous. The problem is that the driver was sending an incorrect nToReturn value over the wire, which inhibited the server from creating further batches to return to the client. I am sorry that we initially misdiagnosed this issue - you can track its fix at CXX-699.

Comment by Adam Midvidy [ 02/Oct/15 ]

Hi Viacheslav, I have closed this ticket as we will not be able to improve nToReturn semantics until MongoDB 3.2. Please comment or file another ticket if you have any further questions.

Generated at Wed Feb 07 21:59:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.