[SERVER-27949] Failure to parse isMaster response in ReplicaSetMonitor can cause bad information to be written to config.shards Created: 08/Feb/17 Updated: 27/Oct/23 Resolved: 12/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.0.14, 3.2.12, 3.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Benjamin Caimano (Inactive) |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2017-03-27 | ||||||||
| Participants: | |||||||||
| Description |
|
When used as part of sharding, the ReplicaSetMonitor causes the config.shards entry for a particular replica set shard to be kept up to date as the replica set's membership changes. If for whatever reason parsing the isMaster response fails, the RSM will consider the replica set as containing no hosts and will write a bad entry to config.shards and that entry will eventually be read by all other shards and mongos instances, rendering the shard unusable until manually repaired. This was evidenced in In order to be on the safe side, if the RSM fails to parse the isMaster response for any reason, it should log a warning, but not cause update of config.shards. |
| Comments |
| Comment by Benjamin Caimano (Inactive) [ 12/Aug/19 ] |
|
If we fail to find the primary for a replica set, we update the shard registry with the last set of nodes we were able to consider for the cluster (see here). We only actually remove nodes when we hear from the primary, so the primary would have to claim to be okay but send an empty list of hosts for the shard.config to be affected. For reference, this is where the mongos would update its registry with the list of possible nodes. |
| Comment by Benjamin Caimano (Inactive) [ 25/Jul/19 ] |
|
Stealing to service arch, this may or may not be an issue anymore. |