[SERVER-1813] replSet does not work when bind="internal_ip" Created: 20/Sep/10  Updated: 12/Jul/16  Resolved: 27/Sep/10

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.1
Fix Version/s: 1.7.1

Type: Bug Priority: Major - P3
Reporter: Gilles Devaux Assignee: Mathias Stearn
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos


Operating System: Linux
Participants:

 Description   

server started with

bind=internal_ip
replSet=set1

> rs.initiate()
{
"startupStatus" : 4,
"info" : "set1",
"errmsg" : "all members and seeds must be reachable to initiate set",
"ok" : 0
}

works when 'bind' is not specified but then it's impossible to rs.add('internal_ip')



 Comments   
Comment by Jeff Yemin (Inactive) [ 04/Dec/10 ]

We would like to see this change backported to 1.6.x.

Comment by Eliot Horowitz (Inactive) [ 05/Oct/10 ]

Its only in 1.7 now.
We might be able to backport.
Can you verify 1.7 works in your environment?

Comment by Ask Bjørn Hansen [ 05/Oct/10 ]

Is this fix in the 1.6.x snapshots or only in 1.7.x? It bit us, too...

Comment by Mathias Stearn [ 27/Sep/10 ]

I've merged in Gilles's changes and made the modifications requested by dwight. Please reopen if you still have this issue in the latest codebase.

Comment by auto [ 27/Sep/10 ]

Author:

{'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'redbeard0531@gmail.com'}

Message: Move bind_ip handling from me() to Me(). SERVER-1813
http://github.com/mongodb/mongo/commit/8f6b5ab3d2fade6a6f6b37af48f55b5a204a6401

Comment by auto [ 27/Sep/10 ]

Author:

{'login': 'gilles', 'name': 'gilles', 'email': 'gilles@peerpong.com'}

Message: Don't need to copy bind_ip SERVER-1813
http://github.com/mongodb/mongo/commit/aab1fcff76c2374692f8c56bfefea3e6203a6eb4

Comment by auto [ 27/Sep/10 ]

Author:

{'login': '', 'name': 'root', 'email': 'root@mongotest1.peerpong.net'}

Message: Make hostname.me() smarter
Use hostname.me() when creating a repl config from scratch

bug #SERVER-1813
http://jira.mongodb.org/browse/SERVER-1813
http://github.com/mongodb/mongo/commit/1cefd8cc5dbd9901b9de21c3e07dcefdca172d3e

Comment by auto [ 27/Sep/10 ]

Author:

{'login': '', 'name': 'root', 'email': 'root@mongotest1.peerpong.net'}

Message: Make hostname.me() smarter
Use hostname.me() when creating a repl config from scratch

bug #SERVER-1813
http://jira.mongodb.org/browse/SERVER-1813
http://github.com/mongodb/mongo/commit/1cefd8cc5dbd9901b9de21c3e07dcefdca172d3e

Comment by Eric Anderson [ 27/Sep/10 ]

Just want to add I have this same issue. I also prefer #1 for the same reasons as Gilles.

Comment by Gilles Devaux [ 24/Sep/10 ]

I just tested an explicit initiate and it still does not work, thought I don't understand why:

> config = {_id: 'set1', members: [
... {_id: 0, host: '10.177.163.57:27017'},
... {_id: 1, host: '10.177.163.62:27017'}]
... }
{
"_id" : "set1",
"members" : [

{ "_id" : 0, "host" : "10.177.163.57:27017" }

,

{ "_id" : 1, "host" : "10.177.163.62:27017" }

]
}
> config
{
"_id" : "set1",
"members" : [

{ "_id" : 0, "host" : "10.177.163.57:27017" }

,

{ "_id" : 1, "host" : "10.177.163.62:27017" }

]
}
>
>
> rs.initiate(config)
{
"startupStatus" : 4,
"info" : "set1",
"errmsg" : "all members and seeds must be reachable to initiate set",
"ok" : 0
}
> rs.conf()
null

both IPs are accessible with telnet from both machines

could it be mongo removes IPs that are 'self' during initiate() and add ::Me() ?

if you don't mind me saying I also prefer #1. The reason is we don't use BIND or any internal DNS server (startup, eng doing ops, BIND is a nightmare) -> we are IP based only, we limit possible casualties by having a provisioning system so we can change config very quickly. Since hostname is not always available when IP is specified -> should use IP, when hostname is specified -> hostname is ok

Comment by Eliot Horowitz (Inactive) [ 24/Sep/10 ]

We should do #1.
IPs are often better than hostnames anyway since hostnames tend to be meaningless, and in the worst case duplicates .

Comment by Dwight Merriman [ 24/Sep/10 ]

mathias,

the only place HostAndPort::Me() is used is in the auto-initiation of a set with no explicit config. in that case we need to infer the machine's true name. this is a convenience thing one can always explicitly specify the hostname. thus we can't use me(), at least as-is.

i think there are two possible options:

(1) the patch suggested above (although it should go in HostAndPort::Me()) – that is use the --bindip command line parm. the one issue with this is the repl set config will then contain an IP address instead of a logical hostname?

or

(2) when bindip is in use, have Me() assert. additionally, make auto replSetInitiate return a nice error message in that case saying explicit initiation will be required.

Eliot do you prefer #1 or #2?

Comment by Dwight Merriman [ 24/Sep/10 ]

@Gilles thanks for reporting this. i believe if you initiate explicitly, instead of using the defaults, it will work. i.e. :

> rs.initiate(

{ _id : setname, members : ... }

)

instead of

> rs.initiate()

will look into further.

Comment by Mathias Stearn [ 23/Sep/10 ]

Dwight, can we get rid of HostAndPort::Me() and just use me()? Using the real hostname seems likely to result in issues with --bind_ip

Comment by Gilles Devaux [ 22/Sep/10 ]

I have a fix here, tested for my use case (no seed in --replSet, rs.initiate())
http://github.com/gilles/mongo/

Feel free to use it. Two things:

  • I don't know the mongo code enough to see if this change has side effect, please double check
  • My C / C++ is old, forgive me if it sucks
Comment by Gilles Devaux [ 22/Sep/10 ]

Correct what I said, in case of rs.initiate() it seems the server uses gethostname(), not localhost

rs_initiate.cpp:190
members.append("0", BSON( "_id" << 0 << "host" << HostAndPort::Me().toString() ));

-> HostAndPort::Me() uses gethostname() while HostAndPort::me() uses 'localhost'

Comment by Gilles Devaux [ 22/Sep/10 ]

This does not work, it seems 'localhost' is transformed into the equivalent of `hostname -a` -> mongotest1. This name then maps to the public_ip of the machine, the server still does not listen there.

> rs.initiate()
{
"info2" : "no configuration explicitly specified – making one",
"errmsg" : "couldn't initiate : need members up to initiate, not ok : mongotest1:27017",
"ok" : 0
}

This is exactly: http://jira.mongodb.org/browse/SERVER-1775 except that I think the title of SERVER-1775 is wrong, ReplicaSets require listening on all interfaces.

Comment by Mathias Stearn [ 22/Sep/10 ]

Could you try --bind_ip "localhost,internal_ip" so that you are listening on both?

This may be related to http://jira.mongodb.org/browse/SERVER-1775

Comment by Gilles Devaux [ 21/Sep/10 ]

When using seeds

#common
bind_ip = 10.177.163.57
port = 27017
nssize = 16
verbose = true

#master / slave / pair / replSet (empty of none defined)
replSet = set1/10.177.163.57:27017,10.177.163.62:27017

> rs.initiate()
{
"startupStatus" : 4,
"info" : "set1/10.177.163.57:27017,10.177.163.62:27017",
"errmsg" : "all members and seeds must be reachable to initiate set",
"ok" : 0
}

probably because:
Tue Sep 21 16:56:41 [initandlisten] replSet ignoring seed 10.177.163.57:27017 (=self)

=> replSet is still looking for 'localhost'

Comment by Gilles Devaux [ 21/Sep/10 ]

Can you be more explicit when you say "internal ip"
Is that localhost? lan?

What happens when you don't use bind and do rs.add( "internal ip" )

the machines have three network interfaces: lo / eth0 / eth1

eth0 is assigned with a public_ip, accessible from the external world
eth1 is assigned with a private_ip, accessible from the internal network only

I have 2 machines, mongotest1 and mongotest2

case 1:
------------
mongod.cf:
#common
bind_ip = 10.177.163.57
port = 27017
#master / slave / pair / replSet (empty of none defined)
replSet = set1

> rs.initiate()
{
        "startupStatus" : 4,
        "info" : "set1",
        "errmsg" : "all members and seeds must be reachable to initiate set",
        "ok" : 0
}

case 2:
------------
mongod.cf:
#common
#bind_ip = 10.177.163.57
port = 27017
#master / slave / pair / replSet (empty of none defined)
replSet = set1

> rs.initiate()
{
        "startupStatus" : 4,
        "info" : "set1",
        "errmsg" : "all members and seeds must be reachable to initiate set",
        "ok" : 0
}
> rs.initiate()
{
"info2" : "no configuration explicitly specified – making one",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
> rs.conf()
{
"_id" : "set1",
"version" : 1,
"members" : [

{ "_id" : 0, "host" : "mongotest1:27017" }

]
}
>

From here I can do:
rs.add(private_ip_mongotest2)
-> works

mongotest1
> rs.conf()
{
"_id" : "set1",
"version" : 2,
"members" : [

{ "_id" : 0, "host" : "mongotest1:27017" }

,

{ "_id" : 1, "host" : "10.177.163.62" }

]
}

mongotest2:
> rs.conf()
{
"_id" : "set1",
"version" : 2,
"members" : [

{ "_id" : 0, "host" : "mongotest1:27017" }

,

{ "_id" : 1, "host" : "10.177.163.62" }

]
}

then the first member of the set is the public_id.
When the configuration will broadcasted, mongotest2 will try to use the dns name bound to the public_ip to communicate with mongotest1.

I'm reluctant to expose mongo to the external world for security reasons.

It seems that the replSet can't find 'self' when the server is bound to a private ip

I think I got it:
replSet adds configs.push_back( ReplSetConfig(HostAndPort::me()) ); automatically (rs.cpp:439)
HostAndPort::me() is hardcoded 'localhost', cmdLine.port
=> hostandport.h:46

The server is not bound to localhost when bind_ip is in the configuration

I will run some more tests adding seeds in the replSet config line

Comment by Eliot Horowitz (Inactive) [ 21/Sep/10 ]

Can you be more explicit when you say "internal ip"
Is that localhost? lan?

What happens when you don't use bind and do rs.add( "internal ip" )

Generated at Thu Feb 08 02:58:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.