-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Networking & Observability
-
Egress gRPC 2025-01-31, Egress gRPC 2025-02-14
-
1
The current gRPC stream establishment timeout enforcement doesn't distinguish between timeouts due to not being able to reach the remote (i.e. the channel reaching the CONNECTED state) and timeouts due to there being more than MAX_CONCURRENT_STREAMS open to the remote, which is more of a resource contention problem. Both are reported as "NetworkTimeout", which could potentially be misleading.
In addition, we do not set wait_for_ready to true, so if a user attempts to establish a stream while the channel's status is not CONNECTED, it will fail immediately with a HostUnreachable error. This is potentially desirable, but it could also make sense to wait for at least connectTimeoutMS for the channel to reach that state in the background before giving up. Under the hood, gRPC will retry with exponential backoff, so this could be something useful to take advantage of.
Waiting for the channel to become established before failing also gives us the ability to differentiate between connectivity timeouts and timeouts establishing streams. It doesn't help us differentiate between network timeouts and hitting MAX_CONCURRENT_STREAMS, however, though one could argue that we don't need to. If we wanted to differentiate between those, we could enforce a client-configured MAX_CONCURRENT_STREAMS in the async client factory.