[SERVER-33809] Secondary member crashed due to OOM in production Created: 12/Mar/18 Updated: 27/Oct/23 Resolved: 13/Mar/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Roie Yossef | Assignee: | Dmitry Agranat |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Participants: | |||||
| Description |
|
Hi All, the mongo is the only thing we run in the machine (using docker). i'v checked the system logs and found this: i do have the diagnostic data but not sure how to analyse this (see attached). we don't have any swap memory in the machine, is it recommended to add? its a production machine and first time we encounter a failure so its important to us to understand the root cause and what is necessary to avoid such cases Thanks |
| Comments |
| Comment by Dmitry Agranat [ 13/Mar/18 ] |
|
Hi roiey, All good questions. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. Questions like this involving more discussion would be best posted on the mongodb-users group. Thanks, |
| Comment by Roie Yossef [ 13/Mar/18 ] |
|
Hi Dima, regarding to the disks , what do you consider as poor IO performance? what disks should we have? we use AWS EBS volume , io1 type with 5000 IOPS. |
| Comment by Dmitry Agranat [ 13/Mar/18 ] |
|
Hi roiey, After looking at the provided data, I do not see an indication of a bug. I believe that the reported issue is due to insufficient available memory (4GB) relative to your workload. As the memory required for operations grows, the working set in the WT cache cannot be pushed back onto disk. Instead, the operating system begins swapping application memory to disk until it runs out of space and kills the process that is using the most memory, which, in this case, is MongoDB. Based on the additional metrics observed (see below), it appears that this server is also suffering from poor IO performance and mostly reading the data from either the FS cache or disk.
All the above indicate that the server is underprovisioned to sustain your workload. Thanks, |