[SERVER-20068] Mongodb 3.0.5 with wiredTiger causing Out of memory Issues Created: 20/Aug/15 Updated: 25/Aug/15 Resolved: 25/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Praveen Akinapally | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | Deploy a mongo version 3.0.5 sharded replica set Environment - Run four shard servers(all primary in their replica set) on this machine with 13 GB Cache Size. Perform write intensive jobs and over time memory used by shard servers increase and finally lead to System Kill due to out of memory. |
| Participants: |
| Description |
|
We have upgraded our replicated sharded mongo db setup to the latest 3.0.5 i hope of fixing OOM issues we are having after we migrated our Storage Engine from MMAP to WT but our memory usage issues didn't go away. Memory usage increases over time and only restart releases allocated memory. We are running 4 shards on Ubuntu Server(primary instance in our 3 member replica set) having 60 GB System Memory with WT as Storage Engine. We set Cache Size to 13 GB for each Shard Server leaving 8 GB memory for System processes and if Mongo requires more memory for Open Cursors, Open Sessions etc. But it uses way more and System kills the process. Two of the four shard servers running on our primary instance failed with OOM error due to system kill. Please find attached db.serverStatus( {tcmalloc:true}) captured for all the four shard servers running on Primary captured from 1 hour before the failure occurred. Also attached syslog which logged the system kill actions of the two shard servers. |
| Comments |
| Comment by Ramon Fernandez Marina [ 25/Aug/15 ] |
|
praveenak, I see a large number of open cursors, which may or may not be contributing to the problem. In MongoDB 3.0.6 we added a palliative fix for this issue in That being said, I think the main cause for this particular problem was the machine's memory was being overtaxed by running too many servers in it, so I'm going to close this issue for now. If running on a more sensible configuration and upgrading to 3.0.6 do not help with this issue please feel free to open a new ticket and we'll take it from there. Regards, |
| Comment by Praveen Akinapally [ 21/Aug/15 ] |
|
Thanks for looking into it. I will decrease the cache size and see. Interested to know what are your findings from the logs. Regards, |
| Comment by Ramon Fernandez Marina [ 20/Aug/15 ] |
|
Thanks for your report praveenak. I'm afraid there's a lot of confusion over this topic, so I'll try to clarify things: setting the WiredTiger cache size to 13GB doesn't mean that mongod will only use 13GB – it means that WiredTiger should not use more than 13GB for its cache, but mongod will use memory for other things (connections, session cache, cursor cache...). If I understand your setup correctly, you're running a 4-shard system on one box. If that's correct, up to 52GB may be used for the WiredTiger cache, thus leaving only 8GB to run four mongod data-bearing processes, config servers, mongos processes, the operating system and whatever other applications may be running. Without proper configuration (e.g.: per-process memory limitations, large amounts of swap available) the system is bound to run out of memory and kill whichever process is more memory hungry – most likely a mongod. That being said, we'll take a look at the data you collected to make sure there are no other memory issues lurking around – thanks for collecting this data in advance. Regards, |