[SERVER-9504] More Details of Replica Set Member Health/State Created: 29/Apr/13  Updated: 06/Dec/22  Resolved: 23/Aug/18

Status: Closed
Project: Core Server
Component/s: Networking, Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Stephen Lee Assignee: Backlog - Replication Team
Resolution: Incomplete Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Participants:

 Description   

Currently, mongod will log messages like

Fri Mar  1 16:06:04 [rsHealthPoll] replSet info otherreplicasetmember:27017 is down (or slow to respond): DBClientBase::findN: transport error: otherreplicasetmember:27017 query: { replSetHeartbeat: "rs", v: 3, pv: 1, checkEmpty: false, from: "otherreplicasetmember2:27017" }
Fri Mar  1 16:06:04 [rsHealthPoll] replSet member otherreplicasetmember:27017 is now in state DOWN

but "down (or slow to respond)" is not very specific on the possible state of the otherreplicasetmember.

Instead, some more details about the connection to otherreplicasetmember could be added:

1) If the mongod is actually not listening on the port any more (or inaccessible due to say, a firewall), then attempting to connect to it should usually result in TCP "Connection refused" errors. This would be useful to know.
2) If the TCP connection cannot be established (say, if a firewall is blocking the connection and swallowing the SYN packet without response), this would be useful to know.
3) If the TCP connection is alive, but no responses are being received over the connection, then the current message is appropriate.
4) If the host is up, but responding slowly to pings/heartbeats (but it is responding), this would be very useful to differentiate from #3 for investigators.

And of course, one might start in state #4 and transition to another state like #2 or #3. All of which would be useful to be logged.



 Comments   
Comment by Spencer Brody (Inactive) [ 23/Aug/18 ]

This ticket is very old, and the log messages have changed a lot since then.  We aren't sure if this is still a problem, if it is, please open a new ticket.

Generated at Thu Feb 08 03:20:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.