[SERVER-37162] Bad host string given to rs.add() on Replicated Config servers may take down entire cluster Created: 17/Sep/18 Updated: 19/Sep/18 Resolved: 18/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dave Muysson | Assignee: | Nick Brewer |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | In a sharded environment running with a replicated config server, perform an rs.add() that includes the replicaset name in front of the host dns name and port.
i.e. rs.add("replsetname/host:port") instead of rs.add("host:port") |
||||||||
| Participants: | |||||||||
| Description |
|
We have a sharded development cluster running 3.2.18 that we are moving from SCCC to Replicated config servers. The cluster consists of 3 config servers, 18 MongoD shards, and 5 MongoS's. At one point in the migration process, where we were now using a replicated config server configuration, we accidentally issued a bad rs.add() command from the mongo shell against the Primary Config Server:
> rs.add("csReplSet/dev-config-2.domain.com:27019") We should not have included the initial "csReplSet/" within the string (human error). However, what happened next was concerning. While the config servers were fine (rs.status showed it couldn't reach the new host), every MongoS and MongoD host issued a backtrace / core dump and terminated.
Here is what the Config server rs.status reported for the added host:
Here's the log output from a MongoD (shard) just before the backtrace.
It looks like the MongoS and MongoD hosts tried to adjust their config server list to add the new host, but did not validate the hostname before trying to use it?
We recovered our development environment from backup, and are going to be testing our process again. While I don't have the full list of log files to provide here, we could try this again if you need more details. Reproducing it shouldn't be hard though, just add a bad host to the config server replicaset! |
| Comments |
| Comment by Nick Brewer [ 18/Sep/18 ] |
|
dave.muysson@360pi.com We've determined that the best way to prevent this is to strictly disallow including connection strings in a replica set config. We've opened a separate ticket to track this work, which you can follow here: Since we're now tracking this elsewhere, I'm going to go ahead and close this ticket. Thanks again for your detailed report, and please let us know if you have any questions. -Nick |
| Comment by Spencer Brody (Inactive) [ 18/Sep/18 ] |
|
This does seem like a real bug. I think the proper fix is that we shouldn't allow a replica set connection string for a 'host' field in a replica set config. I filed |
| Comment by Dave Muysson [ 18/Sep/18 ] |
|
Thanks Nick - Very happy to hear you were able to reproduce it locally! If there's anything we can do to help out on our end, just let us know.
|
| Comment by Nick Brewer [ 17/Sep/18 ] |
|
dave.muysson@360pi.com Thanks for your report. I've managed to reproduce this, and we're currently investigating. -Nick |