Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.0.2, 6.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.0
Steps To Reproduce:

Hide

Run an aggregation pipeline with a $search then a $limit where the $search would return many documents.

For example, add 2500 identical documents { a: "hello" } to a collection test.search with a search index definition with { dynamic: true }, and run

db.search.aggregate([ { $search: { text: { path: "a", query: "hello" }} }, {$limit: 10}])

This doesn't reproduce 100% of the time, but most executions should invoke the behavior.

Show
Run an aggregation pipeline with a $search then a $limit where the $search would return many documents. For example, add 2500 identical documents { a: "hello" } to a collection test.search with a search index definition with { dynamic: true }, and run db.search.aggregate([ { $search: { text: { path: "a", query: "hello" }} }, {$limit: 10}]) This doesn't reproduce 100% of the time, but most executions should invoke the behavior.
Sprint:
Service Arch 2022-05-16, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We've witnessed mongod log lines similar to the following when running simple $search queries.

{
  "t":{"$date":"2022-04-06T23:43:01.391+00:00"},
  "s":"I",
  "c":"CONNPOOL", "id":22566,
  "ctx":"MongotExecutor",
  "msg":"Ending connection due to bad connection status",
  "attr": {
    "hostAndPort":"mongot1:27027",
    "error": "CallbackCanceled: Callback was canceled",
    "numOpenConns":1
  }
}

This is most easily reproducible when running a $search aggregation with a small $limit.

We believe this may be related to TaskExecutorCursor's eager retrieval of a second batch of results, even if the first batch has not been fully consumed.

When running $search, mongod will immediately issue a search command to mongot, and mongot will return with a batch of 101 results. While the query layer is processing the results from the first batch, TaskExecutorCursor will fire off a getMore command on the batch.

It appears that if the query completes (e.g. can fulfill the _id lookup and $limit using the first batch) while this getMore is outstanding, that when returning the connection used for the getMore to the pool, the connection will be seen as having an error, and the connection will be closed.

mongod will then have to open a new connection to run killCursor for the TaskExecutorCursor. The logs below appear to show such a case.

mongot | 2022-04-06T22:09:21.844+0000 T MONGOT [c.x.m.s.wireprotocol.SearchCommand] Received search
mongot | 2022-04-06T22:09:21.850+0000 T MONGOT [c.x.m.s.wireprotocol.GetMoreCommand] Received getMore
mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"MongotExecutor","msg":"Connecting","attr":{"hostAndPort":"mongot1:27027"}}
mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I",  "c":"CONNPOOL", "id":22566,   "ctx":"MongotExecutor","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"mongot1:27027","error":"CallbackCanceled: Callback was canceled","numOpenConns":1}}
mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Closing connection 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b
mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing ops associated with channel 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b
mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing operation: 135
mongot | 2022-04-06T22:09:21.874+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Received connection 0242acfffe160004-0000000c-00000016-185a4190571511cf-619ce6d9 from /172.22.0.3:58152
mongot | 2022-04-06T22:09:21.883+0000 T MONGOT [c.x.m.s.w.KillCursorsCommand] Received killCursors

This does not appear to produce a correctness issue, but does create a lot of connection churn between mongod and mongot, and the log lines can be concerning for users.

We were able to reproduce this on at least 5.0 and 5.2.

Acceptance Criteria: Write a test that reproduces this behavior (use the Steps To Reproduce field) then fix the behavior, using the test as a way to prove correctness.

Assignee:: Amirsaman Memaripour
Reporter:: Kevin Rosendahl
Participants:: Amirsaman Memaripour, Githook User, Kevin Rosendahl
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Apr 07 2022 12:25:29 AM UTC
Updated:: Oct 29 2023 09:39:53 PM UTC
Resolved:: Jul 07 2022 04:04:24 PM UTC
Confidence Status Last Update:: 31/May/22 5:19 PM

Details

Description

Attachments

Forms

Activity

People

Dates