Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

In a sharded environment running with a replicated config server, perform an rs.add() that includes the replicaset name in front of the host dns name and port.

i.e. rs.add("replsetname/host:port") instead of rs.add("host:port")

Show
In a sharded environment running with a replicated config server, perform an rs.add() that includes the replicaset name in front of the host dns name and port. i.e. rs.add("replsetname/host:port") instead of rs.add("host:port")
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a sharded development cluster running 3.2.18 that we are moving from SCCC to Replicated config servers.

The cluster consists of 3 config servers, 18 MongoD shards, and 5 MongoS's.

At one point in the migration process, where we were now using a replicated config server configuration, we accidentally issued a bad rs.add() command from the mongo shell against the Primary Config Server:

> rs.add("csReplSet/dev-config-2.domain.com:27019")

We should not have included the initial "csReplSet/" within the string (human error).

However, what happened next was concerning.

While the config servers were fine (rs.status showed it couldn't reach the new host), every MongoS and MongoD host issued a backtrace / core dump and terminated.

Here is what the Config server rs.status reported for the added host:

 "_id" : 3,
            "name" : "csReplSet/dev-config-3.domain.com:27019",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },

Here's the log output from a MongoD (shard) just before the backtrace.

2018-09-14T15:38:55.179+0000 I NETWORK  [ReplicaSetMonitorWatcher] changing hosts to csReplSet/dev-config-0.domain.com:27019,dev-config-4.domain.com:27019,csReplSet/dev-config-3.domain.com:27019 from csReplSet/dev-config-0.domain.com:27019,dev-config-4.domain.com:27019
2018-09-14T15:38:55.179+0000 I -        [ReplicaSetMonitorWatcher] Invariant failure setName == connString.getSetName() src/mongo/s/config.cpp 770
2018-09-14T15:38:55.179+0000 I -        [ReplicaSetMonitorWatcher]

It looks like the MongoS and MongoD hosts tried to adjust their config server list to add the new host, but did not validate the hostname before trying to use it?

We recovered our development environment from backup, and are going to be testing our process again. While I don't have the full list of log files to provide here, we could try this again if you need more details.

Reproducing it shouldn't be hard though, just add a bad host to the config server replicaset!

duplicates

SERVER-37190 Don't allow adding replica set connection strings for a member in a replica set config

Closed

Assignee:: Nick Brewer (Inactive)
Reporter:: Dave Muysson
Participants:: Dave Muysson, Nick Brewer, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Sep 17 2018 01:47:22 PM UTC
Updated:: Sep 19 2018 12:54:04 PM UTC
Resolved:: Sep 18 2018 06:06:57 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates