[SERVER-3963] Log Failed Heartbeat Polling Created: 27/Sep/11 Updated: 29/Feb/12 Resolved: 16/Nov/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kyle Banker | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Please provide more verbose logging around failed replica set heartbeats. We're seeing a number of cases where inexplicable re-elections are happening, and the logs aren't providing helpful feedback around what's happening. |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 16/Nov/11 ] |
|
Feel free to comment if anyone has any suggestions. |
| Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ] |
|
No, at the moment it always "fails fast" because it's a little complicated to figure out when to fail fast and when not to. We want to retry connecting if it's "likely" to be a network interruption (member is in a different dc) and fail fast if the member's in the same dc. |
| Comment by Kyle Banker [ 27/Sep/11 ] |
|
I thought there were attempt to reconnect, etc., before initiating failover. |
| Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ] |
|
I'm not sure how much more information we can give... all that the member knows is that it got an error on the socket. Any suggestions are welcome. |