[SERVER-12477] IP address resolution for the replica set node doesn't work properly. Created: 25/Jan/14 Updated: 18/Feb/14 Resolved: 18/Feb/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Kubenko | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Windows |
| Steps To Reproduce: | Turn off one replica set member and return it with different IP. |
| Participants: |
| Description |
|
When a member of a replica set goes down and returns back with another IP address it can't find that node. I did nslookup for a name and it found it after that. Down below you can find the log A bit more about how this happens: When in Azure datacenter I resize VM or VM is moved it get's another internal IP address. After that it appears with the new one and it can not be resolved. Fri Jan 24 23:36:42.497 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001 Fri Jan 24 23:36:50.040 [conn12558] end connection 10.175.112.48:49158 (5 connections now open) Fri Jan 24 23:36:50.992 [rsHealthPoll] replset info MongoNode03:10001 thinks that we are down Here's the details of the configuration where the problem appears: |
| Comments |
| Comment by Kubenko [ 17/Feb/14 ] |
|
I think we should close this as this issue has nothing to do with the mongo. It is configuration for the OS. |
| Comment by Kubenko [ 17/Feb/14 ] |
|
Hi Mark, Sorry for the delay. I don't have any of those values mentioned in the article in the VM registry. Today I did a test. I provisioned VM. It started at 23.08 with the new ip. , , { "_id" : 2, "name" : "PRJNAME04hidden:10001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 6598, "optime" : Timestamp(1392641223, 1), "optimeDate" : ISODate("2014-02-17T12:47:03Z"), "lastHeartbeat" : ISODate("2014-02-17T23:18:23Z"), "lastHeartbeatRecv" : ISODate("2014-02-17T23:18:23Z"), "pingMs" : 1, "syncingTo" : "PRJNAME02:10001" } ], , , { "_id" : 2, "name" : "PRJNAME04hidden:10001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 6833, "optime" : Timestamp(1392641223, 1), "optimeDate" : ISODate("2014-02-17T12:47:03Z"), "lastHeartbeat" : ISODate("2014-02-17T23:22:19Z"), "lastHeartbeatRecv" : ISODate("2014-02-17T23:22:19Z"), "pingMs" : 1, "syncingTo" : "PRJNAME02:10001" } ], at 2014-02-17T23:18:25Z it was not available but at 2014-02-17T23:22:19Z it is starting looks like default delay is 10 or 15 minutes. |
| Comment by Mark Benvenuto [ 28/Jan/14 ] |
|
Do you have negative DNS lookup caching enabled on your Windows VMs? Unfortunately, due to a bug in 2.4.8 the error message is unclear as to the exact cause : "000000014109CBD0". I expect that when you changed the configuration of your Azure tenant, the host lookup failed on the DNS server, was cached by the local machine, and was not fixed until you did a nslookup. See http://adminfoo.net/2004/10/dont-cache-negative-dns-lookups-on.html |