[SERVER-19915] Cursor may indicate that it is not exhausted even though next getMore will close the cursor and return no results Created: 12/Aug/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 3.1.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Craig Wilson Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-21086 mongos does not kill cursor with getM... Closed
Related
is related to SERVER-17839 Remove PlanStage::isEOF Backlog
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:

use test
db.foo.insert({_id: 1})
db.foo.insert({_id: 2})
db.foo.insert({_id: 3})
 
db.foo.find({_id: {$gt: 1}}).limit(2)

Participants:

 Description   

I'm unsure if this is a bug or simply a change, but it needs to be documented or fixed as certain tests drivers are running have started failing.

The gist of this is that when 3.0 exhausted the documents from a query, it didn't return a cursorId. This has changed in 3.1.x where a cursor will be returned even when there are no more documents to iterate. This prompts a driver to issue a kill cursor.

I grabbed the wire bits to make this easier to understand.

MessageLength       95
RequestID           XX
ResponseTo          0
OpCode              OP_QUERY(2004)
Flags               0
FullCollectionName  test.foo
NumberToSkip        0
NumberToReturn      2
Query               { $query: { _id: { $gt: 1 } } }

When running against server 3.0.5, the OP_REPLY is the following:

RequestID       XX
MessageLength   78
ResponseTo      XX
OpCode          OP_REPLY(1)
ResponseFlags   AwaitCapable(8)
CursorID        0
StartingFrom    0
NumberReturned  2
Documents       [{ _id: 2, x: 22 },{ _id: 3, x: 33 }]

However, when running against 3.1.7, the OP_REPLY is the following:

RequestID       XX
MessageLength   78
ResponseTo      XX
OpCode          OP_REPLY(1)
ResponseFlags   AwaitCapable(8)
CursorID        14624793327
StartingFrom    0
NumberReturned  2
Documents       [{ _id: 2, x: 22 },{ _id: 3, x: 33 }]

Notice that the CursorID is non-zero in the 3.1 code.

When running this from the shell, the shell does not issue a kill cursor, so if you are looking at the wire from the shell, that won't be there. I'm unsure if that's a bug in the shell or not.



 Comments   
Comment by Scott Hernandez (Inactive) [ 07/Jul/16 ]

This was also observed while working on replication during initial sync when we saw an empty batch response when there were no more documents left, resulting in an extra getMore being issued to exhaust the cursor.

It would be good to address this sooner than later as it might be a performance issue for users whos collection/query count is a factor of the batchSize, resulting in an extra network round-trip for every query.

Comment by J Rassi [ 05/Oct/15 ]

I have a half-baked idea that may be able to address this.

When a PlanExecutor object is constructed, work() should be called on the root stage until the first result is returned. This result should be buffered as member state. When PlanExecutor::getNext() is called, it should return this buffered result, and then call work() on the root stage to buffer the next result. This would allow for a proper implementation of PlanExecutor::isEOF(): simply return whether or not there is a buffered result. This would also remove the need for the PlanExecutor::enqueue() method.

Generated at Thu Feb 08 03:52:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.