[SERVER-27949] Failure to parse isMaster response in ReplicaSetMonitor can cause bad information to be written to config.shards Created: 08/Feb/17  Updated: 27/Oct/23  Resolved: 12/Aug/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.14, 3.2.12, 3.4.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Benjamin Caimano (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-27793 'exception while parsing isMaster rep... Closed
Operating System: ALL
Sprint: Sharding 2017-03-27
Participants:

 Description   

When used as part of sharding, the ReplicaSetMonitor causes the config.shards entry for a particular replica set shard to be kept up to date as the replica set's membership changes.

If for whatever reason parsing the isMaster response fails, the RSM will consider the replica set as containing no hosts and will write a bad entry to config.shards and that entry will eventually be read by all other shards and mongos instances, rendering the shard unusable until manually repaired. This was evidenced in SERVER-27793 and the specific error is in this comment.

In order to be on the safe side, if the RSM fails to parse the isMaster response for any reason, it should log a warning, but not cause update of config.shards.



 Comments   
Comment by Benjamin Caimano (Inactive) [ 12/Aug/19 ]

If we fail to find the primary for a replica set, we update the shard registry with the last set of nodes we were able to consider for the cluster (see here). We only actually remove nodes when we hear from the primary, so the primary would have to claim to be okay but send an empty list of hosts for the shard.config to be affected. For reference, this is where the mongos would update its registry with the list of possible nodes.

Comment by Benjamin Caimano (Inactive) [ 25/Jul/19 ]

Stealing to service arch, this may or may not be an issue anymore.

Generated at Thu Feb 08 04:16:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.