[SERVER-40622] MongoDB replica set outage “due to bad connection status” Created: 12/Apr/19  Updated: 13/May/19  Resolved: 13/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.16
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Cassio Mosqueira Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File m1-log (secondary).txt     Text File m2-log (secondary).txt     Text File mongo-log.txt    
Participants:

 Description   

I have a production replica set that went down for 3-4 minutes, apparently because the Primary could not connect to one of the Secondaries. I'm having a hard time understanding why the Primary kept stepping down and up.

I've been managing this replica set for 3 years and have never seen this happen.

Here's the replica set structure:

PRIMARY: m4.mydomain.com (104.167.32.55) - Priority 10
SECONDARY: m2.mydomain.com (162.221.2.98) - Priority 5
SECONDARY: m1.mydomain.com (40.113.11.54) - Priority 1

 

The log for the period is attached.



 Comments   
Comment by Eric Sedor [ 13/May/19 ]

Hi cassioam@gmail.com, we really appreciate your patience with this delay.

It's clear from the logs and diagnostic data you provided that the replica set experienced clock skew on the m4 node; we aren't able to identify a bug.

I'm going to close this ticket for now but please open a new one if you can make sure that all nodes have the same clock time and still experience issues.

Thank you,
Eric

Comment by Cassio Mosqueira [ 16/Apr/19 ]

Thank you for providing the private portal. I uploaded all 3 diagnostic files. They are named diagnostic.data-m4.zip, diagnostic.data-m2.zip, and diagnostic.data-m1.zip.

Note that the log files I uploaded here in the public ticket have the real subdomains (m1, m2, m4), but a mock domain. Also, you can use the 2 first parts of the IP addresses in the log files in case you need to match them with the real ones.

Comment by Eric Sedor [ 15/Apr/19 ]

Yes, I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Comment by Cassio Mosqueira [ 12/Apr/19 ]

I added the logs for the secondaries. Is there a secure channel I can upload the diagnostic files? I'd rather not expose real IP addresses and domains on a public platform. Thanks.

Comment by Eric Sedor [ 12/Apr/19 ]

Hello, can you please attach the logs for each node in the set at the time of an incident? And could you please also archive (tar or zip) the $dbpath/diagnostic.data directory from each server (described here) and attach it to this ticket?

Generated at Thu Feb 08 04:55:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.