-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
The ReplicaSetMonitor-TaskExecutor currently uses the NetworkInterfaceThreadPool in its ThreadPoolTaskExecutor. In spite of its name, this isn't actually a thread pool--it's just an OutOfLineExecutor that uses the NetworkInterface's single reactor thread for scheduling tasks.
Utilizing a single thread for the ReplicaSetMonitor-TaskExecutor can present a problem in situations where there are a large number of RSM-related tasks to perform, particularly when there's a network disruption in a cluster with a lot of monitored hosts. In extreme scenarios, the reactor thread can fall behind on its work so much that it fails to make forward progress, preventing the RSM from discovering hosts and affecting availability.
To help avoid such cases, we could instead use a an actual ThreadPool executor for RSM related work, and only use the reactor thread for networking tasks the RSM needs to perform.
- related to
-
SERVER-91479 Reject connection acquisition attempts if queue gets too long
- Open