[SERVER-19915] Cursor may indicate that it is not exhausted even though next getMore will close the cursor and return no results Created: 12/Aug/15 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.1.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Craig Wilson | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I'm unsure if this is a bug or simply a change, but it needs to be documented or fixed as certain tests drivers are running have started failing. The gist of this is that when 3.0 exhausted the documents from a query, it didn't return a cursorId. This has changed in 3.1.x where a cursor will be returned even when there are no more documents to iterate. This prompts a driver to issue a kill cursor. I grabbed the wire bits to make this easier to understand.
When running against server 3.0.5, the OP_REPLY is the following:
However, when running against 3.1.7, the OP_REPLY is the following:
Notice that the CursorID is non-zero in the 3.1 code. When running this from the shell, the shell does not issue a kill cursor, so if you are looking at the wire from the shell, that won't be there. I'm unsure if that's a bug in the shell or not. |
| Comments |
| Comment by Scott Hernandez (Inactive) [ 07/Jul/16 ] |
|
This was also observed while working on replication during initial sync when we saw an empty batch response when there were no more documents left, resulting in an extra getMore being issued to exhaust the cursor. It would be good to address this sooner than later as it might be a performance issue for users whos collection/query count is a factor of the batchSize, resulting in an extra network round-trip for every query. |
| Comment by J Rassi [ 05/Oct/15 ] |
|
I have a half-baked idea that may be able to address this. When a PlanExecutor object is constructed, work() should be called on the root stage until the first result is returned. This result should be buffered as member state. When PlanExecutor::getNext() is called, it should return this buffered result, and then call work() on the root stage to buffer the next result. This would allow for a proper implementation of PlanExecutor::isEOF(): simply return whether or not there is a buffered result. This would also remove the need for the PlanExecutor::enqueue() method. |