[SERVER-18842] WiredTiger & indexing: "kernel: Out of memory: Kill process 32011 (mongod) score 966 or sacrifice child" Created: 05/Jun/15 Updated: 20/Jun/15 Resolved: 20/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, WiredTiger |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nicolas Fouché | Assignee: | Ramon Fernandez Marina |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
The hidden secondary server I use for backups cannot sync its data from the replica set. It's a replica set of 3 servers including the one which get killed by the kernel. All are running mongod 3.0.3. I'm currently trying to migrate from mmapv1 to WiredTiger. The two main servers use mmapv1. The initial sync goes fine, copying our 292 million documents, mongod takes around 80% of the RAM. Then it starts building the index on _id. After a few %, the kernel kills mongod. I attached the logs from /var/log/messages. If I relaunch mongod, it retries building the index, then exits because of (the famous ?):
If I relaunch mongod with indexBuildRetry=false, it restarts cloning the collection from the beginning after showing:
This server has only 16GB RAM, and can do a sync with mmapv1 without any problem. The database size is 946GB on mmapv1, and 243GB on WiredTiger (snappy). I started running the following command when the cloning was at 236M docs. I attached the result.
Cloning finishes at 2015-06-05T16:09:38.003+0000. The last log before the kill:
On another replica set, this migration worked fine with a database size of 402GB mmapv1 / 133GB WT. For the record, the sync also took all the RAM, but index creation went fine. What's interesting is that after a restart of mongod on this server, RAM usage went from 100% to 17%. After a few days of continuous writes from the oplog, it's now at 43% RAM. I did not launch any query or make any backup from this secondary. The two replica sets contain the same kind of documents, and the same indexes. My storage configuration:
There's this storage.wiredTiger.engineConfig.cacheSizeGB setting. I don't really get what it means. I tried setting the default, or setting it at 14. RAM usage still goes beyond the server capacity. This issue could be related to In case someone at MongoDB is interested, the servers are monitored by MMS:
|
| Comments |
| Comment by Ramon Fernandez Marina [ 20/Jun/15 ] | ||||||||||||||||
|
Thanks for reporting back nicolas_, and glad to hear that you were able to complete the initial sync. This problem with memory consumption is addressed in If you upgrade to 3.0.5 when it comes out, please note that if you have 16GB of memory it may be too aggressive to let mongod use 14GB for cache, as that may have a negative impact on other parts of the system that need memory (including filesystem cache and mongod itself), so I'd still recommend you let WiredTiger choose the size of the cache. I'm closing this ticket as a duplicate of Regards, | ||||||||||||||||
| Comment by Nicolas Fouché [ 14/Jun/15 ] | ||||||||||||||||
|
I attached new graphs showing memory usage during the last days. | ||||||||||||||||
| Comment by Nicolas Fouché [ 10/Jun/15 ] | ||||||||||||||||
|
I attached a new screenshot of Cache Usage and Activity. | ||||||||||||||||
| Comment by Nicolas Fouché [ 10/Jun/15 ] | ||||||||||||||||
|
ramon.fernandez, it worked with cacheSizeGB: 1. Here are log excerpts of the sync:
I attached a screenshot of RAM usage over time.
I also did an ss.log, would you like me to share it ? So... questions:
| ||||||||||||||||
| Comment by Ramon Fernandez Marina [ 05/Jun/15 ] | ||||||||||||||||
|
nicolas_, you may be running into Thanks, |