[SERVER-84289] Differentiate between network timeouts and connection failures in gRPC Created: 18/Dec/23  Updated: 19/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Patrick Freed Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

grpc::Client currently uses WaitForConnected in its establishment logic, which as its name suggests, will wait for the provided amount of time for the gRPC channel to enter the CONNECTED state. However, the channel may make several attempts to reconnect up until the timeout is hit, including backoff and jitter. This is different behavior than asio, which will immediately report connection errors (e.g. Connection refused errors if the remote is down).

We should update this logic to stop after the first attempt fails. As an example of how this may affect things, the shell will no longer need to wait the full connectTimeout when attempting to reach a down server.


Generated at Thu Feb 08 06:54:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.