Details
-
Improvement
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
Description
Currently, there is 1-1 relationship between RemoteCommandTargeter and ReplicaSetMonitor. Because of this, if the RSM becomes unusable due to none of the hosts being reachable, the targeter will forever be using an unusable RSM, which may happen if shards are inaccessible in the beginning.
What saves us in this case is that we periodically (every 30 sec) reload the shard registry, done by the balancer loop, which will recreate the targeters and install new RSMs.
We should make the RemoteCommandTargeter own the ReplicaSetMonitor and introduce a polling thread in the ShardRegistry so that we have finer control over its behaviour.