[SERVER-4130] isself too agressive when bind_ip also specified Created: 24/Oct/11  Updated: 11/Jul/16  Resolved: 23/Dec/11

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 2.0.1
Fix Version/s: 2.1.0

Type: Bug Priority: Major - P3
Reporter: Stephen J. Smith Assignee: Brandon Diamond
Resolution: Done Votes: 0
Labels: replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 11.04, Mongodb 2.0.1


Operating System: Linux
Participants:

 Description   

We are trying to secure our new multi-datacenter mongo cluster using stunnel. To do so we created additional interfaces, assigned them unused ips, and added entries to /etc/hosts. Communication was tested with mongo --host and is working as expected. We use bind_ip to carefully control which of the available interfaces mongod is listening on.

The issue we ran into is that adding the new member runs afoul of the isself() check:

Mon Oct 24 07:13:00 [conn14] run command admin.$cmd { replSetReconfig: { _id: "members-a", version: 7, members: [

{ _id: 0, host: "ecnext80:27017" }

,

{ _id: 1, host: "ecnext78:27017" }

,

{ _id: 2, host: "ecnext82:27017" }

,

{ _id: 3.0, host: "rsmongo008:27017" }

] } }
Mon Oct 24 07:13:00 [conn14] replSet replSetReconfig config object parses ok, 4 members specified
Mon Oct 24 07:13:00 [conn14] getMyAddrs(): [127.0.0.1] [10.27.17.1] [10.27.17.2] [10.27.17.3] [10.27.17.5] [10.27.17.6] [10.27.17.7] [10.27.17.4] [10.28.16.151] [216.12.148.188] [172.18.5.34] [::1] [fe80::5054:ff:fed2:ba9e%eth0] [fe80::5054:ff:fe68:1fa8%eth1] [fe80::5054:ff:fe38:7593%eth2]
Mon Oct 24 07:13:00 [conn14] getallIPs("rsmongo008"): [10.27.17.4]
Mon Oct 24 07:13:00 [conn14] User Assertion: 13278:bad config: isSelf is true for multiple hosts: ecnext82:27017,rsmongo008:27017
Mon Oct 24 07:13:00 [conn14] replSet replSetReconfig exception: bad config: isSelf is true for multiple hosts: ecnext82:27017,rsmongo008:27017
Mon Oct 24 07:13:00 [conn14] command admin.$cmd command: { replSetReconfig: { _id: "members-a", version: 7, members: [

{ _id: 0, host: "ecnext80:27017" }

,

{ _id: 1, host: "ecnext78:27017" }

,

{ _id: 2, host: "ecnext82:27017" }

,

{ _id: 3.0, host: "rsmongo008:27017" }

] } } ntoreturn:1 exception: bad config: isSelf is true for multiple hosts: ecnext82:27017,rsmongo008:27017 code:13278 reslen:183 0ms

Those 10.27.17.* ips are actually stunnel client endpoints. Further, the server itself is not listening on them:

stsmith@ecnext82:~$ grep bind_ip /etc/mongodb-members-a.conf
bind_ip=127.0.0.1,10.28.16.151

Shouldn't isself() account for bind_ip? Specifically, shouldn't getMyAddrs() only return the ips specified via bind_ip – 127.0.0.1 and 10.28.16.151 in this case?

We tried

{force:true}

for the reconfig, but it didn't help. We also tried having the interface unconfigured when we added rsmongo008, but bad things happened once we brought up that interface.

We are in a position to quickly build/test any suggested patches.

Thanks,
Stephen J. Smith
Systems Engineer
Manta Media, Inc.



 Comments   
Comment by auto [ 23/Dec/11 ]

Author:

{u'login': u'', u'name': u'Brandon Diamond', u'email': u'brandon@10gen.com'}

Message: SERVER-4130: defer to bind_ip arg when getting addrs
Branch: master
https://github.com/mongodb/mongo/commit/68bc1931ee48863bf52c8317395107c6a7b7f617

Comment by Stephen J. Smith [ 27/Oct/11 ]

I hardcoded my bind_ip list in getMyAddrs() to verify the proposed solution. I was able to bring up my cluster secured by stunnel as orignally planned. This is the patch that worked:

— mongodb-src-r2.0.1.orig/db/commands/isself.cpp 2011-10-21 20:52:16.000000000 -0400
+++ mongodb-src-r2.0.1/db/commands/isself.cpp 2011-10-27 10:27:04.100403316 -0400
@@ -40,6 +40,8 @@ vector<string> getMyAddrs() {
massert(13469, "getifaddrs failure: " + errnoWithDescription(errno), status == 0);

vector<string> out;
+ out.push_back("127.0.0.1");
+ return out;

// based on example code from linux getifaddrs manpage
for (ifaddrs * addr = addrs; addr != NULL; addr = addr->ifa_next) {

When getMyAddrs() returns the bind_ip list, the machine name has to resolve to one of the bound ips or you get an error like this:

Thu Oct 27 09:20:35 [initandlisten] Socket recv() errno:104 Connection reset by peer 10.160.111.136:27017
Thu Oct 27 09:20:35 [initandlisten] SocketException: remote: 10.160.111.136:27017 error: 9001 socket exception [1] server [10.160.111.136:27017]
Thu Oct 27 09:20:35 [initandlisten] DBClientCursor::init call() failed
Thu Oct 27 09:20:35 [initandlisten] warning: could't check isSelf (aws003:27017) DBClientBase::findN: transport error: aws003:27017 query:

{ _isSelf: 1 }

Solved by updating /etc/hosts so that aws003 was on the 127.0.0.1 line rather than 10.160.111.136.

Generated at Thu Feb 08 03:05:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.