We've witnessed mongod log lines similar to the following when running simple $search queries.
{ "t":{"$date":"2022-04-06T23:43:01.391+00:00"}, "s":"I", "c":"CONNPOOL", "id":22566, "ctx":"MongotExecutor", "msg":"Ending connection due to bad connection status", "attr": { "hostAndPort":"mongot1:27027", "error": "CallbackCanceled: Callback was canceled", "numOpenConns":1 } }
This is most easily reproducible when running a $search aggregation with a small $limit.
We believe this may be related to TaskExecutorCursor's eager retrieval of a second batch of results, even if the first batch has not been fully consumed.
When running $search, mongod will immediately issue a search command to mongot, and mongot will return with a batch of 101 results. While the query layer is processing the results from the first batch, TaskExecutorCursor will fire off a getMore command on the batch.
It appears that if the query completes (e.g. can fulfill the _id lookup and $limit using the first batch) while this getMore is outstanding, that when returning the connection used for the getMore to the pool, the connection will be seen as having an error, and the connection will be closed.
mongod will then have to open a new connection to run killCursor for the TaskExecutorCursor. The logs below appear to show such a case.
mongot | 2022-04-06T22:09:21.844+0000 T MONGOT [c.x.m.s.wireprotocol.SearchCommand] Received search mongot | 2022-04-06T22:09:21.850+0000 T MONGOT [c.x.m.s.wireprotocol.GetMoreCommand] Received getMore mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"MongotExecutor","msg":"Connecting","attr":{"hostAndPort":"mongot1:27027"}} mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I", "c":"CONNPOOL", "id":22566, "ctx":"MongotExecutor","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"mongot1:27027","error":"CallbackCanceled: Callback was canceled","numOpenConns":1}} mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Closing connection 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing ops associated with channel 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing operation: 135 mongot | 2022-04-06T22:09:21.874+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Received connection 0242acfffe160004-0000000c-00000016-185a4190571511cf-619ce6d9 from /172.22.0.3:58152 mongot | 2022-04-06T22:09:21.883+0000 T MONGOT [c.x.m.s.w.KillCursorsCommand] Received killCursors
This does not appear to produce a correctness issue, but does create a lot of connection churn between mongod and mongot, and the log lines can be concerning for users.
We were able to reproduce this on at least 5.0 and 5.2.
Acceptance Criteria: Write a test that reproduces this behavior (use the Steps To Reproduce field) then fix the behavior, using the test as a way to prove correctness.