[SERVER-34398] Mongo WiredTiger Memory Spike And OOM Isssue Created: 09/Apr/18 Updated: 23/Jul/18 Resolved: 21/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alex Etling | Assignee: | Kelsey Schubert |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
Over the last couple of months we have seen the hot secondary on one of our replica sets have a memory spike and then get killed by the linux OOM_KILLER*Setup: *
The load on hot secondary is the load to keep in sync with the primary + some queries on 2 collections that exist in this replica set. These queries sometimes include table scans. Every once in a while our hot secondary's memory usage will spike causing the oom_killer to kill the running mongod process. (see the mongo memory spike screenshot attached) There are some other strange things going on with the mongo process running out of memory. One is that the WiredTiger cache size does not seem to increase during this time period. (See the cache usage image attached) There also seems to be huge spike in the amount of data read from disk right before the mongod memory usage spikes - about 40GBs worth. (see the attached disk read spike). A weird thing here is that those disk reads are on the mounted file system which holds the mongo logs (and is located at `/`) and not the one that holds the mongo data. I have noticed this issue: https://jira.mongodb.org/browse/SERVER-27909 which seems like it could be related? I have also attached the diagnostic.data logs from during and after the incident below. I believe that the incident should be towards the end of metrics.2018-04-08T00-12-09Z-00000. Let me know if there is any other data you need me to provide. Any help would be greatly appreciated. |
| Comments |
| Comment by Kelsey Schubert [ 21/Jun/18 ] |
|
We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, |
| Comment by Kelsey Schubert [ 10/Apr/18 ] |
|
I've reviewed the metrics files you've provided, but unfortunately do not have enough information to conclusively diagnose this issue. So we can continue to investigate, would you please restart mongod with --setParameter heapProfilingEnabled=true. After encountering the issue again, please upload the following information:
These files will record information that should enable us to track the source of the memory increase. For this purpose it is important that we have complete logs and diagnostic.data covering the time since the restart. Since the required files may be too large to attach to this ticket, I've generated a secure upload portal for you to use. Thank you for your help, |