[SERVER-41066] All cluster PRIMARY mongod was killed by oomkiller in a few seconds Created: 09/May/19 Updated: 24/Jun/19 Resolved: 24/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | 4.0.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Artem | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
We had several incidents: all heavy loaded cluster PRIMARY mongod was killed by oomkiller in a few seconds. On the graph, it looked like mongod process doubled memory usage from 64GB in a few seconds. During the investigation, we found in the logs lines like:
and:
We have limited the batch size in one of the our service tasks, reducing the BSON size and the problem was solved. We also found in the logs a few spikes in memory consumption with similar symptoms that did not cause to oomkiller: it does not look like a memory leak, since memory consumption has returned to its normal usage. In terms of mongod behavior, I am concerned about the following points:
|
| Comments |
| Comment by Eric Sedor [ 24/Jun/19 ] |
|
Hi, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Sincerely, |
| Comment by Eric Sedor [ 03/Jun/19 ] |
|
Hi bozaro We still need additional information to diagnose the problem. If this is still an issue for you, would you please upload the log files and the $dbpath/diagnostic.data directory covering one or more of these incidents to this secure private portal, and provide a timeline for the incident(s)? Thanks in advance. |
| Comment by Bruce Lucas (Inactive) [ 09/May/19 ] |
|
bozaro, so that we can investigate these issues further, would you be able to upload the log files and the $dbpath/diagnostic.data directory covering one or more of these incidents? You can upload them to this secure private portal. Also please tell us the timeline of the incident(s) covered by the data. |