[SERVER-3963] Log Failed Heartbeat Polling Created: 27/Sep/11  Updated: 29/Feb/12  Resolved: 16/Nov/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kyle Banker Assignee: Kristina Chodorow (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Please provide more verbose logging around failed replica set heartbeats. We're seeing a number of cases where inexplicable re-elections are happening, and the logs aren't providing helpful feedback around what's happening.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 16/Nov/11 ]

Feel free to comment if anyone has any suggestions.

Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ]

No, at the moment it always "fails fast" because it's a little complicated to figure out when to fail fast and when not to. We want to retry connecting if it's "likely" to be a network interruption (member is in a different dc) and fail fast if the member's in the same dc.

Comment by Kyle Banker [ 27/Sep/11 ]

I thought there were attempt to reconnect, etc., before initiating failover.

Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ]

I'm not sure how much more information we can give... all that the member knows is that it got an error on the socket. Any suggestions are welcome.

Generated at Thu Feb 08 03:04:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.