[SERVER-6083] Invalid replica set connection management Created: 13/Jun/12  Updated: 03/Jan/18  Resolved: 16/Sep/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Aristarkh Zagorodnikov Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Irrelevant


Operating System: ALL
Participants:
Case:

 Description   

When server already runs a MongoDB instance on default port and default (hostname-determined) IP address, initializing another instance with the same port but different IP address fails.
We have server with 2 interfaces, ip1 and ip2. server hostname resolves to ip1. ip1 hosts a node of replica set called "A". I want to have a node of another replica set called "B" to be hosted on ip2.
The "A" RS node has bind_ip=ip1,127.0.0.1 and "B" RS node has bind_ip=ip2. After initiating RS "B" on the node (using rs.initiate()) I get the "[rsStart] replSet exception loading our local replset configuration object : 13132 nonmatching repl set name in _id field; check --replSet command line" which is obviously wrong since I make sure that the node data directory was clean before. I spent two hours trying to diagnose the problem. Solution was subtle - I had to change port from default one to another one, and it worked. Now it appears that RS code tries to do some "magic" using host names and ips and connects to wrong ip address, determines the "wrong replica set name" and then bails out.

This was very frustrating and I hope you will fix this soon. Also the message might disclose a little more information (for example what replica set name was encountered instead of the expected one). I think I've seen the case that was asking for that, but can't find this now.
Also, I believe that connecting to a wrong machine might lead to disastrous events in case of same-named but different replica sets.

And while you're at it, please fix the "mongo host" vs "mongo host.zone" problem (if there are no dots in name, shell always tries local host instead of host, relative to search order, specified in resolv.conf), I think it's the same problem.



 Comments   
Comment by Aristarkh Zagorodnikov [ 15/Sep/12 ]

We scrapped all these machines already and moved to VMs that have unique network interfaces, so it's no longer a problem.

Comment by Eliot Horowitz (Inactive) [ 15/Sep/12 ]

I don't think that's what happened.
When you bound to 127.0.0.1, but didn't specify the ip in initiate, it got confused.
You need to specify the host in the config in that case.
CAn you try that?

Comment by Aristarkh Zagorodnikov [ 13/Jun/12 ]

Also, the following line when port is changed right after initiation indicates that it really tries to connect to localhost:
[rsStart] couldn't connect to localhost:37017: couldn't connect to server localhost:37017

Generated at Thu Feb 08 03:10:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.