[SERVER-20088] both secondary crashed in replicaset in mongo 3.0.4 Created: 23/Aug/15  Updated: 16/Nov/21  Resolved: 26/Aug/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rakesh Kumar Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongo 3.0.4 with three node cluster. (1 primary and 2 secondary) on aws ec2 (ubuntu 14.04)


Attachments: HTML File mongo primary     File mongo sec. 1     File mongo sec. 2     File mongod-secondary.conf     Text File mongod.log    
Operating System: Linux
Steps To Reproduce:

Its production systems, i tried but can't replicate this in staging.

Participants:

 Description   

We have 3 node replica cluster on mongo 3.0.4. And both mongo secondary crashed with out of memory error and due to this our primary mongo stepped down to secondary.
Attaching all logs in files.



 Comments   
Comment by Ramon Fernandez Marina [ 26/Aug/15 ]

My apologies rakesh.mib.j, that was pilot error on my part: I skipped to the end looking for the backtrace and I didn't see it. I can confirm that there were no index builds at the time, so SERVER-18829 is not the issue you're running into.

The first thing to note is about the WiredTiger cache setting: specifying a cache size of 12GB doesn't mean mongod will only use 12GB only. This setting only limits the WiredTiger cache, but mongod needs additional memory to operate. Note also that using 12GB means that the OS will be strapped for file buffers, which will have a negative impact on performance. Bottom line, my first recommendation would be to let mongod use the default cache size.

This doesn't mean we're out of the woods; between 3.0.4 and 3.0.6 there have been some issues fixed on excessive memory consumption by WiredTiger, so I'd also recommend upgrading to 3.0.6.

Since I think the behavior you're seeing is related to the large cache size these nodes are configured with I'm going to close this ticket for now. If the issue persists in 3.0.6 feel free to post here again and we'll reopen the ticket for further investigation.

Regards,
Ramón.

Comment by Rakesh Kumar [ 26/Aug/15 ]

this log is from the beginning, when server was started to the end (when server was crashed). server configuration is 4 core and 15 Gb ram on AWS cloud.
We don't have swap space on these servers.

Comment by Ramon Fernandez Marina [ 24/Aug/15 ]

rakesh.mib.j, thanks for updating the log and the config file, but unfortunately this log does not contain the information we need to investigate this issue as it does not contain the time window we need. The logs we need are from server restart until the time of the crash.

We also need to know how much memory and swap space these secondaries have.

Thanks,
Ramón.

Comment by Rakesh Kumar [ 24/Aug/15 ]

Thanks for the update, i don't think at that time there is any index creation is going on.
Attached the Secondary full log file named "mongod.log" and mongo configuration file named "mongod-secondary.com"

Comment by Ramon Fernandez Marina [ 24/Aug/15 ]

rakesh.mib.j, were there any index builds happening on these secondaries before they crashed? Also, what's the memory configuration on these boxes and the startup parameters? My first guess is that you may be running into SERVER-18829, but I'd need you to provide the details above; ideally, if you could upload the full logs from the last restart for at least one of the crashed secondaries we could check this hypothesis quickly.

Thanks,
Ramón.

Generated at Thu Feb 08 03:53:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.