Details
-
Bug
-
Resolution: Done
-
Critical - P2
-
None
-
2.4.8
-
None
-
Windows
-
Description
When a member of a replica set goes down and returns back with another IP address it can't find that node. I did nslookup for a name and it found it after that. Down below you can find the log
According to that log I did
nslookup MongoNode03
approximately at Fri Jan 24 23:36:46.995.
A bit more about how this happens: When in Azure datacenter I resize VM or VM is moved it get's another internal IP address. After that it appears with the new one and it can not be resolved.
Fri Jan 24 23:36:42.497 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:44.527 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.527 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:44.871 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.871 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:44.902 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.902 [rsHealthPoll] replset info MongoNode03:10001 heartbeat failed, retrying
Fri Jan 24 23:36:44.933 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.933 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:44.965 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.965 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:44.996 [rsHealthPoll] getaddrinfo("MongoNode03") failed: 000000014109CBD0
Fri Jan 24 23:36:44.996 [rsHealthPoll] couldn't connect to MongoNode03:10001: couldn't connect to server MongoNode03:10001
Fri Jan 24 23:36:46.995 [rsHealthPoll] replSet member MongoNode03:10001 is up
Fri Jan 24 23:36:50.040 [initandlisten] connection accepted from 10.175.112.48:49158 #12558 (6 connections now open)
Fri Jan 24 23:36:50.040 [conn12558] authenticate db: local
Fri Jan 24 23:36:50.040 [conn12558] end connection 10.175.112.48:49158 (5 connections now open)
Fri Jan 24 23:36:50.040 [initandlisten] connection accepted from 10.175.112.48:49159 #12559 (6 connections now open)
Fri Jan 24 23:36:50.040 [conn12559] authenticate db: local
Fri Jan 24 23:36:50.992 [rsHealthPoll] replset info MongoNode03:10001 thinks that we are down
Fri Jan 24 23:36:50.992 [rsHealthPoll] replSet member MongoNode03:10001 is now in state STARTUP2
Fri Jan 24 23:37:06.265 [conn12559] end connection 10.175.112.48:49159 (5 connections now open)
Here's the details of the configuration where the problem appears:
Fri Jan 24 15:48:06.397 [initandlisten] git version: a350fc38922fbda2cec8d5dd842237b904eafc14
Fri Jan 24 15:48:06.397 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49