[SERVER-5537] Replica set member behind NAT stopped joining after uprade Created: 06/Apr/12  Updated: 15/Aug/12  Resolved: 07/Apr/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mage Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

FreeBSD 9 stable


Operating System: FreeBSD
Participants:

 Description   

I have a 3-members replica set on FreeBSD servers. It has been working until I upgraded them to 2.0.3 or 2.0.4. Not sure. I am now on 2.0.4.

It surely worked with MongoDB 2.0.2 for months.

The problem is that one of the members (server 3) is behind NAT. The rest two are dedicated servers. Server 3 is arbiter and backup.

Server 3 (behind NAT) log:

Fri Apr 6 23:27:34 [conn39] authenticate:

{ authenticate: 1, nonce: "xxxxxx", user: "__system", key: "xxxxxx" }

Fri Apr 6 23:27:41 [rsStart] replSet error self not present in the repl set configuration:
Fri Apr 6 23:27:41 [rsStart] { _id: "xxxx", version: 5, members: [

{ _id: 0, host: "aaaa:27017" }

,

{ _id: 1, host: "bbbb:27017" }

,

{ _id: 3, host: "cccc:27017", priority: 0.0, hidden: true }

] }
Fri Apr 6 23:27:41 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.

The address "cccc" is the external address of my home router. The port is forwarded from the router to the server 3 which has a local IP address like 192.168.1.x

The replica set config contains the external IP address of server 3 of course. It has been working fine in the past.

Now it says:

"_id" : 3,
"name" : "cccc:27017",
"health" : 1,
"state" : 8,
"stateStr" : "DOWN",
"uptime" : 104,
"optime" :

{ "t" : 1333047253000, "i" : 10 }

,
"optimeDate" : ISODate("2012-03-29T18:54:13Z"),
"lastHeartbeat" : ISODate("2012-04-06T21:20:13Z"),
"pingMs" : 36,
"errmsg" : "still initializing"
}

It sends heartbeats. The firewall config is fine, I can connect with "mongo" client from server 1 and from server 2 to server 3. I think it just doesn't join the set since it thinks he is not a member.

It doesn't help if I use bind_ip in the config or I do not.

In the past I used bind_ip with the 192.168.1.x address. This is why had these lines in the log:
[startReplSets] couldn't connect to localhost:27017: couldn't connect to server localhost:27017

But even with these lines it worked like a charm before the upgrade. Now it doesn't no matter I use bind_ip or not.

I think that the key line is: [rsStart] replSet error self not present in the repl set configuration. I can't set the external ip for bind_ip of course.

I just deleted the whole data directory of server 3 and restarted it. Didn't help.



 Comments   
Comment by Mage [ 07/Apr/12 ]

It seems to me that this is rather a DD-WRT issue. Server 3 can't see itself through its external ip address via port forwarding since the latest DD-WRT firmware upgrade.

Sorry for the noise.

Generated at Thu Feb 08 03:09:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.