[SERVER-20088] both secondary crashed in replicaset in mongo 3.0.4 Created: 23/Aug/15 Updated: 16/Nov/21 Resolved: 26/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Rakesh Kumar | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | bug | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
mongo 3.0.4 with three node cluster. (1 primary and 2 secondary) on aws ec2 (ubuntu 14.04) |
||
| Attachments: |
|
| Operating System: | Linux |
| Steps To Reproduce: | Its production systems, i tried but can't replicate this in staging. |
| Participants: |
| Description |
|
We have 3 node replica cluster on mongo 3.0.4. And both mongo secondary crashed with out of memory error and due to this our primary mongo stepped down to secondary. |
| Comments |
| Comment by Ramon Fernandez Marina [ 26/Aug/15 ] |
|
My apologies rakesh.mib.j, that was pilot error on my part: I skipped to the end looking for the backtrace and I didn't see it. I can confirm that there were no index builds at the time, so The first thing to note is about the WiredTiger cache setting: specifying a cache size of 12GB doesn't mean mongod will only use 12GB only. This setting only limits the WiredTiger cache, but mongod needs additional memory to operate. Note also that using 12GB means that the OS will be strapped for file buffers, which will have a negative impact on performance. Bottom line, my first recommendation would be to let mongod use the default cache size. This doesn't mean we're out of the woods; between 3.0.4 and 3.0.6 there have been some issues fixed on excessive memory consumption by WiredTiger, so I'd also recommend upgrading to 3.0.6. Since I think the behavior you're seeing is related to the large cache size these nodes are configured with I'm going to close this ticket for now. If the issue persists in 3.0.6 feel free to post here again and we'll reopen the ticket for further investigation. Regards, |
| Comment by Rakesh Kumar [ 26/Aug/15 ] |
|
this log is from the beginning, when server was started to the end (when server was crashed). server configuration is 4 core and 15 Gb ram on AWS cloud. |
| Comment by Ramon Fernandez Marina [ 24/Aug/15 ] |
|
rakesh.mib.j, thanks for updating the log and the config file, but unfortunately this log does not contain the information we need to investigate this issue as it does not contain the time window we need. The logs we need are from server restart until the time of the crash. We also need to know how much memory and swap space these secondaries have. Thanks, |
| Comment by Rakesh Kumar [ 24/Aug/15 ] |
|
Thanks for the update, i don't think at that time there is any index creation is going on. |
| Comment by Ramon Fernandez Marina [ 24/Aug/15 ] |
|
rakesh.mib.j, were there any index builds happening on these secondaries before they crashed? Also, what's the memory configuration on these boxes and the startup parameters? My first guess is that you may be running into Thanks, |