[SERVER-30799] Tailable cursor with batch size periodically returns unexpected empty batches Created: 23/Aug/17  Updated: 30/Oct/23  Resolved: 30/Aug/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 3.5.13

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-29142 Add sharding support for targeting a ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Save this to a file called repro.js at the root of the mongo repository directory:

repro.js

(function() {
    "use strict";
    const st = new ShardingTest({shards: 1});
    const db = st.s.getDB("test");
 
    assert.commandWorked(db.runCommand({create: "capped", capped: true, size: 1024}));
    assert.writeOK(db.capped.insert([{_id: 1}, {_id: 2}, {_id: 3}, {_id: 4}, {_id: 5}, {_id: 6}]));
 
    const findRes =
        assert.commandWorked(db.runCommand({find: "capped", tailable: true, batchSize: 2}));
    const cursorId = findRes.cursor.id;
    assert.neq(cursorId, 0);
    let getMoreRes = assert.commandWorked(
        db.runCommand({getMore: cursorId, collection: "capped", batchSize: 2}));
    // This assertion fails, zero results in this batch.
    assert.eq(getMoreRes.cursor.nextBatch.length, 2);
    st.stop();
}());

Then run:

python buildscripts/resmoke.py repro.js

Sprint: Repl 2017-09-11
Participants:

 Description   

When a tailable cursor is used in combination with a batch size B, the semantics should be:

  • The initial find should return up to B results, without waiting for at least B. That is, if B = 4, but there were only 3 matching documents, it should return just those 3.
  • A getMore against a tailable cursor should return an empty batch if there are no further results.
  • A getMore against a tailable cursor with N < B new results should return N without waiting for a full B to appear to fill the batch.
  • A getMore against a tailable cursor with N >= B new results should return B, and return the rest on the next getMore (subject to the same batch size constraints).

If using a tailable cursor against a mongod process, these are the semantics. However, if running against a mongos (which you can do with an unsharded capped collection), the last case breaks down a bit. The mongos correctly returns batches up to size B, but messes up the next getMore, incorrectly 'remembering' that the next result on that cursor should be EOF (see reproduction steps for more details).

Specifically, the boolean '_eofNext' tracked within the AsyncResultsMerger is not reset on each getMore, so a full batch on one getMore will cause an empty batch on the next, even if there are more results to return.



 Comments   
Comment by Githook User [ 30/Aug/17 ]

Author:

{'name': 'Charlie Swanson', 'username': 'cswanson310', 'email': 'charlie.swanson@mongodb.com'}

Message: SERVER-30799 Avoid misleading empty batches with tailable cursors.

This bug impacts tailable cursors being sent through a mongos.
Branch: master
https://github.com/mongodb/mongo/commit/bfbeb0cbabd9ae85f34df430474c9e524b274862

Comment by Charlie Swanson [ 23/Aug/17 ]

david.storch I'm adding this to the change streams epic since it's preventing some change stream tests from passing after implementing SERVER-29142. Specifically, some of the assertions in jstests/aggregation/sources/changeStream/change_stream.js use a batch size and a tailable cursor.

Seem okay to you? I have a patch that fixes it locally, but I'm working out some complications imposed by the usage of the AsyncResultsMerger in CollectionCloner, where we apparently are working with a null OperationContext.

Generated at Thu Feb 08 04:25:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.