-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
ALL
-
We have a sharded development cluster running 3.2.18 that we are moving from SCCC to Replicated config servers.
The cluster consists of 3 config servers, 18 MongoD shards, and 5 MongoS's.
At one point in the migration process, where we were now using a replicated config server configuration, we accidentally issued a bad rs.add() command from the mongo shell against the Primary Config Server:
> rs.add("csReplSet/dev-config-2.domain.com:27019")
We should not have included the initial "csReplSet/" within the string (human error).
However, what happened next was concerning.
While the config servers were fine (rs.status showed it couldn't reach the new host), every MongoS and MongoD host issued a backtrace / core dump and terminated.
Here is what the Config server rs.status reported for the added host:
"_id" : 3, "name" : "csReplSet/dev-config-3.domain.com:27019", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) },
Here's the log output from a MongoD (shard) just before the backtrace.
2018-09-14T15:38:55.179+0000 I NETWORK [ReplicaSetMonitorWatcher] changing hosts to csReplSet/dev-config-0.domain.com:27019,dev-config-4.domain.com:27019,csReplSet/dev-config-3.domain.com:27019 from csReplSet/dev-config-0.domain.com:27019,dev-config-4.domain.com:27019 2018-09-14T15:38:55.179+0000 I - [ReplicaSetMonitorWatcher] Invariant failure setName == connString.getSetName() src/mongo/s/config.cpp 770 2018-09-14T15:38:55.179+0000 I - [ReplicaSetMonitorWatcher]
It looks like the MongoS and MongoD hosts tried to adjust their config server list to add the new host, but did not validate the hostname before trying to use it?
We recovered our development environment from backup, and are going to be testing our process again. While I don't have the full list of log files to provide here, we could try this again if you need more details.
Reproducing it shouldn't be hard though, just add a bad host to the config server replicaset!
- duplicates
-
SERVER-37190 Don't allow adding replica set connection strings for a member in a replica set config
- Closed