[SERVER-78554] search commands run over PinnedConnectionTaskExecutor can retry before the underlying connection pool processes initial failure Created: 29/Jun/23  Updated: 29/Oct/23  Resolved: 07/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc8, 6.0.9

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: George Wangensteen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0
Sprint: Service Arch 2023-07-10
Participants:
Linked BF Score: 34

 Description   

$search commands are now retried once in mongod when they fail due to network errors. The infrastructure they run on is obligated to tell the ConnectionPool about the networking error, so that the ConnectionPool can react to information about the un-reachability of the host/issues with sockets to that host. 

Since SERVER-77195, the ConnectionPool will react to most such reported network errors by closing current-generation connections to the remote as the driver CMAP spec specifies, and open a new generation of connections for future requests. 

When using NetworkInterfaceTL to run the RPC/$search, it is guaranteed that the ConnectionPool receives the notification of the NetworkError before the $search/RPC can be retried, which ensures that the retry uses a connection from the new generation. However, with the PinnedConnectionTaskExecutor, there is a race condition where there is the potential for the retry to happen before the ConnectionPool is notified of the error.

This isn't strictly a correctness issue, but our testing assumes that the retry will get a healthy connection, which might not be true if the request sneaks in before the pool closes current-gen connections, because it also fails pending requests as a part of that process. So we should fix either the behavior or the test. 



 Comments   
Comment by Githook User [ 11/Jul/23 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-78554 PinnedConnectionTaskExecutor destroys unhealthy stream before notifying of command completion (cherry picked from commit b33a9c90e78ae15c7d27f0d57acad1bb04fc7bff)
Branch: v6.0
https://github.com/mongodb/mongo/commit/765b955671d340c6f35b64416a96ed213295c197

Comment by Githook User [ 11/Jul/23 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-78554 PinnedConnectionTaskExecutor destroys unhealthy stream before notifying of command comletion

(cherry picked from commit b33a9c90e78ae15c7d27f0d57acad1bb04fc7bff)
Branch: v7.0
https://github.com/mongodb/mongo/commit/6b5b68ebaf86c86db033df14b9d1a51b91c161f1

Comment by Githook User [ 07/Jul/23 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-78554 PinnedConnectionTaskExecutor destroys unhealthy stream before notifying of command comletion
Branch: master
https://github.com/mongodb/mongo/commit/b33a9c90e78ae15c7d27f0d57acad1bb04fc7bff

Generated at Thu Feb 08 06:38:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.