Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5537

Replica set member behind NAT stopped joining after uprade

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.4
    • Component/s: None
    • Labels:
      None
    • Environment:
      FreeBSD 9 stable
    • FreeBSD

      I have a 3-members replica set on FreeBSD servers. It has been working until I upgraded them to 2.0.3 or 2.0.4. Not sure. I am now on 2.0.4.

      It surely worked with MongoDB 2.0.2 for months.

      The problem is that one of the members (server 3) is behind NAT. The rest two are dedicated servers. Server 3 is arbiter and backup.

      Server 3 (behind NAT) log:

      Fri Apr 6 23:27:34 [conn39] authenticate:

      { authenticate: 1, nonce: "xxxxxx", user: "__system", key: "xxxxxx" }

      Fri Apr 6 23:27:41 [rsStart] replSet error self not present in the repl set configuration:
      Fri Apr 6 23:27:41 [rsStart] { _id: "xxxx", version: 5, members: [

      { _id: 0, host: "aaaa:27017" }

      ,

      { _id: 1, host: "bbbb:27017" }

      ,

      { _id: 3, host: "cccc:27017", priority: 0.0, hidden: true }

      ] }
      Fri Apr 6 23:27:41 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.

      The address "cccc" is the external address of my home router. The port is forwarded from the router to the server 3 which has a local IP address like 192.168.1.x

      The replica set config contains the external IP address of server 3 of course. It has been working fine in the past.

      Now it says:

      "_id" : 3,
      "name" : "cccc:27017",
      "health" : 1,
      "state" : 8,
      "stateStr" : "DOWN",
      "uptime" : 104,
      "optime" :

      { "t" : 1333047253000, "i" : 10 }

      ,
      "optimeDate" : ISODate("2012-03-29T18:54:13Z"),
      "lastHeartbeat" : ISODate("2012-04-06T21:20:13Z"),
      "pingMs" : 36,
      "errmsg" : "still initializing"
      }

      It sends heartbeats. The firewall config is fine, I can connect with "mongo" client from server 1 and from server 2 to server 3. I think it just doesn't join the set since it thinks he is not a member.

      It doesn't help if I use bind_ip in the config or I do not.

      In the past I used bind_ip with the 192.168.1.x address. This is why had these lines in the log:
      [startReplSets] couldn't connect to localhost:27017: couldn't connect to server localhost:27017

      But even with these lines it worked like a charm before the upgrade. Now it doesn't no matter I use bind_ip or not.

      I think that the key line is: [rsStart] replSet error self not present in the repl set configuration. I can't set the external ip for bind_ip of course.

      I just deleted the whole data directory of server 3 and restarted it. Didn't help.

            Assignee:
            Unassigned Unassigned
            Reporter:
            mage Mage
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: