Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78554

search commands run over PinnedConnectionTaskExecutor can retry before the underlying connection pool processes initial failure

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc8, 6.0.9
    • Affects Version/s: None
    • Component/s: None
    • None
    • Service Arch
    • Fully Compatible
    • ALL
    • v7.0, v6.0
    • Service Arch 2023-07-10
    • 34

      $search commands are now retried once in mongod when they fail due to network errors. The infrastructure they run on is obligated to tell the ConnectionPool about the networking error, so that the ConnectionPool can react to information about the un-reachability of the host/issues with sockets to that host. 

      Since SERVER-77195, the ConnectionPool will react to most such reported network errors by closing current-generation connections to the remote as the driver CMAP spec specifies, and open a new generation of connections for future requests. 

      When using NetworkInterfaceTL to run the RPC/$search, it is guaranteed that the ConnectionPool receives the notification of the NetworkError before the $search/RPC can be retried, which ensures that the retry uses a connection from the new generation. However, with the PinnedConnectionTaskExecutor, there is a race condition where there is the potential for the retry to happen before the ConnectionPool is notified of the error.

      This isn't strictly a correctness issue, but our testing assumes that the retry will get a healthy connection, which might not be true if the request sneaks in before the pool closes current-gen connections, because it also fails pending requests as a part of that process. So we should fix either the behavior or the test. 

            Assignee:
            george.wangensteen@mongodb.com George Wangensteen (Inactive)
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: