-
Type:
Improvement
-
Resolution: Incomplete
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Networking, Replication
-
None
-
Replication
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
Currently, mongod will log messages like
Fri Mar 1 16:06:04 [rsHealthPoll] replSet info otherreplicasetmember:27017 is down (or slow to respond): DBClientBase::findN: transport error: otherreplicasetmember:27017 query: { replSetHeartbeat: "rs", v: 3, pv: 1, checkEmpty: false, from: "otherreplicasetmember2:27017" } Fri Mar 1 16:06:04 [rsHealthPoll] replSet member otherreplicasetmember:27017 is now in state DOWN
but "down (or slow to respond)" is not very specific on the possible state of the otherreplicasetmember.
Instead, some more details about the connection to otherreplicasetmember could be added:
1) If the mongod is actually not listening on the port any more (or inaccessible due to say, a firewall), then attempting to connect to it should usually result in TCP "Connection refused" errors. This would be useful to know.
2) If the TCP connection cannot be established (say, if a firewall is blocking the connection and swallowing the SYN packet without response), this would be useful to know.
3) If the TCP connection is alive, but no responses are being received over the connection, then the current message is appropriate.
4) If the host is up, but responding slowly to pings/heartbeats (but it is responding), this would be very useful to differentiate from #3 for investigators.
And of course, one might start in state #4 and transition to another state like #2 or #3. All of which would be useful to be logged.