Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65317

mongod removes connection from connection pool after running simple $search query

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.2, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • v6.0
    • Hide

      Run an aggregation pipeline with a $search then a $limit where the $search would return many documents.

      For example, add 2500 identical documents { a: "hello" } to a collection test.search with a search index definition with { dynamic: true }, and run

      db.search.aggregate([ { $search: { text: { path: "a", query: "hello" }} },  {$limit: 10}])

      This doesn't reproduce 100% of the time, but most executions should invoke the behavior.

      Show
      Run an aggregation pipeline with a $search then a $limit where the $search would return many documents. For example, add 2500 identical documents { a: "hello" } to a collection test.search with a search index definition with { dynamic: true }, and run db.search.aggregate([ { $search: { text: { path: "a", query: "hello" }} },  {$limit: 10}]) This doesn't reproduce 100% of the time, but most executions should invoke the behavior.
    • Service Arch 2022-05-16, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11

      We've witnessed mongod log lines similar to the following when running simple $search queries.

      {
        "t":{"$date":"2022-04-06T23:43:01.391+00:00"},
        "s":"I",
        "c":"CONNPOOL", "id":22566,
        "ctx":"MongotExecutor",
        "msg":"Ending connection due to bad connection status",
        "attr": {
          "hostAndPort":"mongot1:27027",
          "error": "CallbackCanceled: Callback was canceled",
          "numOpenConns":1
        }
      }
      

      This is most easily reproducible when running a $search aggregation with a small $limit.

      We believe this may be related to TaskExecutorCursor's eager retrieval of a second batch of results, even if the first batch has not been fully consumed.

      When running $search, mongod will immediately issue a search command to mongot, and mongot will return with a batch of 101 results. While the query layer is processing the results from the first batch, TaskExecutorCursor will fire off a getMore command on the batch.

      It appears that if the query completes (e.g. can fulfill the _id lookup and $limit using the first batch) while this getMore is outstanding, that when returning the connection used for the getMore to the pool, the connection will be seen as having an error, and the connection will be closed.

      mongod will then have to open a new connection to run killCursor for the TaskExecutorCursor. The logs below appear to show such a case.

      mongot | 2022-04-06T22:09:21.844+0000 T MONGOT [c.x.m.s.wireprotocol.SearchCommand] Received search
      mongot | 2022-04-06T22:09:21.850+0000 T MONGOT [c.x.m.s.wireprotocol.GetMoreCommand] Received getMore
      mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"MongotExecutor","msg":"Connecting","attr":{"hostAndPort":"mongot1:27027"}}
      mongod | {"t":{"$date":"2022-04-06T22:09:21.866+00:00"},"s":"I",  "c":"CONNPOOL", "id":22566,   "ctx":"MongotExecutor","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"mongot1:27027","error":"CallbackCanceled: Callback was canceled","numOpenConns":1}}
      mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Closing connection 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b
      mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing ops associated with channel 0242acfffe160004-0000000c-00000015-3263563517157b20-68a2af4b
      mongot | 2022-04-06T22:09:21.867+0000 I MONGOT [c.x.m.s.w.o.op.OperationManager] killing operation: 135
      mongot | 2022-04-06T22:09:21.874+0000 I MONGOT [c.x.m.s.w.WireProtocolQueryServer] Received connection 0242acfffe160004-0000000c-00000016-185a4190571511cf-619ce6d9 from /172.22.0.3:58152
      mongot | 2022-04-06T22:09:21.883+0000 T MONGOT [c.x.m.s.w.KillCursorsCommand] Received killCursors
      

      This does not appear to produce a correctness issue, but does create a lot of connection churn between mongod and mongot, and the log lines can be concerning for users.

      We were able to reproduce this on at least 5.0 and 5.2.

      Acceptance Criteria: Write a test that reproduces this behavior (use the Steps To Reproduce field) then fix the behavior, using the test as a way to prove correctness.

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            kevin.rosendahl@mongodb.com Kevin Rosendahl
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: