[SERVER-65317] mongod removes connection from connection pool after running simple $search query Created: 07/Apr/22 Updated: 29/Oct/23 Resolved: 07/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.2, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kevin Rosendahl | Assignee: | Amirsaman Memaripour |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v6.0
|
||||||||
| Steps To Reproduce: | Run an aggregation pipeline with a $search then a $limit where the $search would return many documents. For example, add 2500 identical documents { a: "hello" } to a collection test.search with a search index definition with { dynamic: true }, and run db.search.aggregate([ { $search: { text: { path: "a", query: "hello" }} }, {$limit: 10}]) This doesn't reproduce 100% of the time, but most executions should invoke the behavior. |
||||||||
| Sprint: | Service Arch 2022-05-16, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11 | ||||||||
| Participants: | |||||||||
| Description |
|
We've witnessed mongod log lines similar to the following when running simple $search queries.
This is most easily reproducible when running a $search aggregation with a small $limit. We believe this may be related to TaskExecutorCursor's eager retrieval of a second batch of results, even if the first batch has not been fully consumed. When running $search, mongod will immediately issue a search command to mongot, and mongot will return with a batch of 101 results. While the query layer is processing the results from the first batch, TaskExecutorCursor will fire off a getMore command on the batch. It appears that if the query completes (e.g. can fulfill the _id lookup and $limit using the first batch) while this getMore is outstanding, that when returning the connection used for the getMore to the pool, the connection will be seen as having an error, and the connection will be closed. mongod will then have to open a new connection to run killCursor for the TaskExecutorCursor. The logs below appear to show such a case.
This does not appear to produce a correctness issue, but does create a lot of connection churn between mongod and mongot, and the log lines can be concerning for users. We were able to reproduce this on at least 5.0 and 5.2. Acceptance Criteria: Write a test that reproduces this behavior (use the Steps To Reproduce field) then fix the behavior, using the test as a way to prove correctness. |
| Comments |
| Comment by Githook User [ 09/Aug/22 ] |
|
Author: {'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}Message: (cherry picked from commit a7ce4c0c9b3abf5bc27675a4b5edde401371a2fd) |
| Comment by Githook User [ 07/Jul/22 ] |
|
Author: {'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}Message: |