[SERVER-62147] Exhaust query using the OP_QUERY protocol is broken when more than one getMore batch is required Created: 17/Dec/21  Updated: 29/Oct/23  Resolved: 13/Jan/22

Status: Closed
Project: Core Server
Component/s: Query Execution
Affects Version/s: 4.2.17, 4.4.10, 5.0.5
Fix Version/s: 5.0.6, 4.2.19, 4.4.13

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-62230 Forward port new exhaust cursor tests... Closed
Related
related to SERVER-68039 Old pymongo version 3.10.1 on MongoDB... Closed
related to CDRIVER-4244 Use OP_MSG for exhaust cursors on 4.2... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4, v4.2
Steps To Reproduce:

Run the following script against a 5.0 server using a 5.0 shell:

(function() {
const coll = db.c;
coll.drop();
 
const strSize = 16 * 1024;
const numDocs = 2000;
 
const str = "A".repeat(strSize);
 
let bulk = coll.initializeUnorderedBulkOp();
for (let i = 0; i < numDocs; ++i) {
    bulk.insert({key: str});
}
assert.commandWorked(bulk.execute());
 
print("attempting exhaust query...");
assert.eq(2000, coll.find().addOption(DBQuery.Option.exhaust).itcount());
}());

Sprint: QE 2021-12-27, QE 2022-01-10, QE 2022-01-24
Participants:

 Description   

Some clients – in particular, the legacy shell, the C driver, and the Python driver – allow the exhaust option to be set on a find operation. This causes the server to write the response to the initial find operation, as well as all subsequent getMore batches, to the socket without waiting for the client to send explicit getMore requests. Exhaust has two implementations in the server, one using the legacy OP_QUERY protocol and another using OP_MSG.

The server implementation of exhaust cursors using OP_QUERY does not work correctly if the response to the query requires more than two batches. The logic is broken in such a way that after sending the first two batches, the server mistakenly categorizes the cursor as non-exhaust. As a consequence, the server waits for the client to send an explicit getMore request. The client, however, correctly believes that the cursor is an exhaust cursor, and waits for the server to write the next batch to the connection. Both sides are waiting for one another, resulting in a hang. It appears that the server tests did not catch this hang because all of our tests for exhaust queries can fit the response within one or two batches.

The specific flaw in the code is as follows:

Note that the OP_QUERY protocol was deprecated in 5.0 and removed entirely from the server in version 5.1. This includes removing support for the OP_QUERY exhaust path. Clients which support exhaust have changed their implementation to use OP_MSG exhaust when communicating with servers >=5.1; this was changed for the legacy mongo shell in SERVER-57462 and changed for the Python driver in PYTHON-1636. For this reason, this bug does not affect server version 5.1. Furthermore, it appears to be a regression introduced in version 4.2, so the only affected versions are 4.2, 4.4, and 5.0.



 Comments   
Comment by Githook User [ 20/Jan/22 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-62147 Fix broken OP_QUERY exhaust cursor implementation

(cherry picked from commit fb4b3eba611b3bc2408cc3e86fa1d1cba9085fde)
(cherry picked from commit fbcee2558090f25bcaa00879b415f018b7da058b)
Branch: v4.2
https://github.com/mongodb/mongo/commit/d3daee1ab48ffe0dbd0d32cafcc339cf6bbe4f30

Comment by Githook User [ 18/Jan/22 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-62147 Fix broken OP_QUERY exhaust cursor implementation

(cherry picked from commit fb4b3eba611b3bc2408cc3e86fa1d1cba9085fde)
Branch: v4.4
https://github.com/mongodb/mongo/commit/fbcee2558090f25bcaa00879b415f018b7da058b

Comment by Githook User [ 13/Jan/22 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-62147 Fix broken OP_QUERY exhaust cursor implementation
Branch: v5.0
https://github.com/mongodb/mongo/commit/fb4b3eba611b3bc2408cc3e86fa1d1cba9085fde

Generated at Thu Feb 08 05:54:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.