-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Networking & Observability
-
Egress gRPC 2025-02-14
-
3
Doing this would provide additional isolation guarantees for each component that uses an AsyncClientFactory, which would ensure that one task executor can't create runaway streams that interfere with another executor sharing this global channel pool.
This will also aid in a possible diagnosability issue in regards to timeouts caused by queueing streams on a single channel, which may lead to NetworkTimeouts, but that are really overload situations. This isn't very different from existing MongoRPC issues, but we can add a log statement to make this more diagnosable. However, we need to know the number of streams per remote per task executor for this to be useful. Moving channel pool ownership up to the AsyncClientFactory (to ensure nothing else was adding streams to the channel) and logging active streams from the _endpoints map in the factory would help us diagnose many streams on a channel more quickly.
However, in the context of Search for Community, this is not a major concern, because only two executors will share the gRPC channel pool, and one of the executors is only used for index management commands, which we do not expect to be resource intensive or frequent.