[SERVER-62147] Exhaust query using the OP_QUERY protocol is broken when more than one getMore batch is required Created: 17/Dec/21 Updated: 29/Oct/23 Resolved: 13/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Query Execution |
| Affects Version/s: | 4.2.17, 4.4.10, 5.0.5 |
| Fix Version/s: | 5.0.6, 4.2.19, 4.4.13 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v5.0, v4.4, v4.2
|
||||||||||||||||||||||||
| Steps To Reproduce: | Run the following script against a 5.0 server using a 5.0 shell:
|
||||||||||||||||||||||||
| Sprint: | QE 2021-12-27, QE 2022-01-10, QE 2022-01-24 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Some clients – in particular, the legacy shell, the C driver, and the Python driver – allow the exhaust option to be set on a find operation. This causes the server to write the response to the initial find operation, as well as all subsequent getMore batches, to the socket without waiting for the client to send explicit getMore requests. Exhaust has two implementations in the server, one using the legacy OP_QUERY protocol and another using OP_MSG. The server implementation of exhaust cursors using OP_QUERY does not work correctly if the response to the query requires more than two batches. The logic is broken in such a way that after sending the first two batches, the server mistakenly categorizes the cursor as non-exhaust. As a consequence, the server waits for the client to send an explicit getMore request. The client, however, correctly believes that the cursor is an exhaust cursor, and waits for the server to write the next batch to the connection. Both sides are waiting for one another, resulting in a hang. It appears that the server tests did not catch this hang because all of our tests for exhaust queries can fit the response within one or two batches. The specific flaw in the code is as follows:
Note that the OP_QUERY protocol was deprecated in 5.0 and removed entirely from the server in version 5.1. This includes removing support for the OP_QUERY exhaust path. Clients which support exhaust have changed their implementation to use OP_MSG exhaust when communicating with servers >=5.1; this was changed for the legacy mongo shell in |
| Comments |
| Comment by Githook User [ 20/Jan/22 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit fb4b3eba611b3bc2408cc3e86fa1d1cba9085fde) |
| Comment by Githook User [ 18/Jan/22 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: (cherry picked from commit fb4b3eba611b3bc2408cc3e86fa1d1cba9085fde) |
| Comment by Githook User [ 13/Jan/22 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: |