[SERVER-21182] Fix replica set monitor ownership Created: 22/Oct/15 Updated: 12/Aug/19 Resolved: 12/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Crystal Horn | Assignee: | Benjamin Caimano (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Currently, there is 1-1 relationship between RemoteCommandTargeter and ReplicaSetMonitor. Because of this, if the RSM becomes unusable due to none of the hosts being reachable, the targeter will forever be using an unusable RSM, which may happen if shards are inaccessible in the beginning. What saves us in this case is that we periodically (every 30 sec) reload the shard registry, done by the balancer loop, which will recreate the targeters and install new RSMs. We should make the RemoteCommandTargeter own the ReplicaSetMonitor and introduce a polling thread in the ShardRegistry so that we have finer control over its behaviour. |
| Comments |
| Comment by Benjamin Caimano (Inactive) [ 12/Aug/19 ] |
|
There is currently one RCM to many RCTs/DBClientRSes. That's no longer 1-1. The RSM no longer becomes permanently unusable if none of those hosts are unreachable. Wouldn't surprise me if it did do that at one point, but it hasn't been the case at least since 3.6. It also does expedited scanning since early 4.1. I'd say we don't have to worry about this ticket any more. |
| Comment by Benjamin Caimano (Inactive) [ 25/Jul/19 ] |
|
Stealing to service arch and investigating |
| Comment by Andy Schwerin [ 09/Nov/15 ] |
|
Kal, please put a more complete description on this and move to 3.1 Desired. |