Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.6, 4.2.19, 4.4.13
Affects Version/s: 4.2.17, 4.4.10, 5.0.5
Component/s: Query Execution
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.0, v4.4, v4.2
Steps To Reproduce:
Hide

Run the following script against a 5.0 server using a 5.0 shell:

(function() { const coll = db.c; coll.drop(); const strSize = 16 * 1024; const numDocs = 2000; const str = "A".repeat(strSize); let bulk = coll.initializeUnorderedBulkOp(); for (let i = 0; i < numDocs; ++i) { bulk.insert({key: str}); } assert.commandWorked(bulk.execute()); print("attempting exhaust query..."); assert.eq(2000, coll.find().addOption(DBQuery.Option.exhaust).itcount()); }());
Show
Run the following script against a 5.0 server using a 5.0 shell: (function() { const coll = db.c; coll.drop(); const strSize = 16 * 1024; const numDocs = 2000; const str = "A" .repeat(strSize); let bulk = coll.initializeUnorderedBulkOp(); for (let i = 0; i < numDocs; ++i) { bulk.insert({key: str}); } assert .commandWorked(bulk.execute()); print( "attempting exhaust query..." ); assert .eq(2000, coll.find().addOption(DBQuery.Option.exhaust).itcount()); }());
Sprint:
QE 2021-12-27, QE 2022-01-10, QE 2022-01-24
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Some clients – in particular, the legacy shell, the C driver, and the Python driver – allow the exhaust option to be set on a find operation. This causes the server to write the response to the initial find operation, as well as all subsequent getMore batches, to the socket without waiting for the client to send explicit getMore requests. Exhaust has two implementations in the server, one using the legacy OP_QUERY protocol and another using OP_MSG.

The server implementation of exhaust cursors using OP_QUERY does not work correctly if the response to the query requires more than two batches. The logic is broken in such a way that after sending the first two batches, the server mistakenly categorizes the cursor as non-exhaust. As a consequence, the server waits for the client to send an explicit getMore request. The client, however, correctly believes that the cursor is an exhaust cursor, and waits for the server to write the next batch to the connection. Both sides are waiting for one another, resulting in a hang. It appears that the server tests did not catch this hang because all of our tests for exhaust queries can fit the response within one or two batches.

The specific flaw in the code is as follows:

When a CanonicalQuery is constructed from a legacy OP_QUERY message, the code fails to tag the CanonicalQuery as being an exhaust query.
When the query is subsequently registered with the CursorManager, this causes the ClientCursor's query options bit vector to have the exhaust bit unset.
The first getMore operation runs successfully. However, in the process it consults the ClientCursor's options bit vector and finds that the exhaust bit is unset.
As a result, the cursor "forgets" that it is an exhaust cursor. It becomes an idle cursor, waiting for the client to issue the next getMore request. This results in a hang.

Note that the OP_QUERY protocol was deprecated in 5.0 and removed entirely from the server in version 5.1. This includes removing support for the OP_QUERY exhaust path. Clients which support exhaust have changed their implementation to use OP_MSG exhaust when communicating with servers >=5.1; this was changed for the legacy mongo shell in ~~SERVER-57462~~ and changed for the Python driver in ~~PYTHON-1636~~. For this reason, this bug does not affect server version 5.1. Furthermore, it appears to be a regression introduced in version 4.2, so the only affected versions are 4.2, 4.4, and 5.0.

is depended on by

SERVER-62230 Forward port new exhaust cursor tests to the master branch

Closed

related to

SERVER-68039 Old pymongo version 3.10.1 on MongoDB v5.0 causes Invariant failure (message.operation() == dbMsg) after connection reset by peer

Closed

CDRIVER-4244 Use OP_MSG for exhaust cursors on 4.2+ servers

Closed

Assignee:: David Storch
Reporter:: David Storch
Participants:: David Storch, Githook User
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Dec 17 2021 03:49:28 PM UTC
Updated:: Oct 29 2023 09:44:55 PM UTC
Resolved:: Jan 13 2022 07:36:27 PM UTC
Confidence Status Last Update:: 22/Dec/21 2:31 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates