[SERVER-40622] MongoDB replica set outage “due to bad connection status” Created: 12/Apr/19 Updated: 13/May/19 Resolved: 13/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.16 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Cassio Mosqueira | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Participants: |
| Description |
|
I have a production replica set that went down for 3-4 minutes, apparently because the Primary could not connect to one of the Secondaries. I'm having a hard time understanding why the Primary kept stepping down and up. I've been managing this replica set for 3 years and have never seen this happen. Here's the replica set structure: PRIMARY: m4.mydomain.com (104.167.32.55) - Priority 10
The log for the period is attached. |
| Comments |
| Comment by Eric Sedor [ 13/May/19 ] |
|
Hi cassioam@gmail.com, we really appreciate your patience with this delay. It's clear from the logs and diagnostic data you provided that the replica set experienced clock skew on the m4 node; we aren't able to identify a bug. I'm going to close this ticket for now but please open a new one if you can make sure that all nodes have the same clock time and still experience issues. Thank you, |
| Comment by Cassio Mosqueira [ 16/Apr/19 ] |
|
Thank you for providing the private portal. I uploaded all 3 diagnostic files. They are named diagnostic.data-m4.zip, diagnostic.data-m2.zip, and diagnostic.data-m1.zip. Note that the log files I uploaded here in the public ticket have the real subdomains (m1, m2, m4), but a mock domain. Also, you can use the 2 first parts of the IP addresses in the log files in case you need to match them with the real ones. |
| Comment by Eric Sedor [ 15/Apr/19 ] |
|
Yes, I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. |
| Comment by Cassio Mosqueira [ 12/Apr/19 ] |
|
I added the logs for the secondaries. Is there a secure channel I can upload the diagnostic files? I'd rather not expose real IP addresses and domains on a public platform. Thanks. |
| Comment by Eric Sedor [ 12/Apr/19 ] |
|
Hello, can you please attach the logs for each node in the set at the time of an incident? And could you please also archive (tar or zip) the $dbpath/diagnostic.data directory from each server (described here) and attach it to this ticket? |