[SERVER-18842] WiredTiger & indexing: "kernel: Out of memory: Kill process 32011 (mongod) score 966 or sacrifice child" Created: 05/Jun/15  Updated: 20/Jun/15  Resolved: 20/Jun/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance, WiredTiger
Affects Version/s: 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nicolas Fouché Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Cache Usage and Activity 1GB.png     PNG File Memory usage over time with WiredTiger.png     PNG File WT sync with cacheSizeGB at 1.png     File ss.log.gz    
Issue Links:
Duplicate
duplicates SERVER-18829 Cache usage exceeds configured maximu... Closed
Operating System: ALL
Participants:

 Description   

The hidden secondary server I use for backups cannot sync its data from the replica set. It's a replica set of 3 servers including the one which get killed by the kernel. All are running mongod 3.0.3.

I'm currently trying to migrate from mmapv1 to WiredTiger. The two main servers use mmapv1. The initial sync goes fine, copying our 292 million documents, mongod takes around 80% of the RAM. Then it starts building the index on _id. After a few %, the kernel kills mongod. I attached the logs from /var/log/messages.

If I relaunch mongod, it retries building the index, then exits because of (the famous ?):

Index Build "Index rebuilding did not complete: 11000 E11000 duplicate key error collection: rum.reports index: _id_ dup key: { : "11jjgtfncmtn9w47mco7vtbbj" }"

If I relaunch mongod with indexBuildRetry=false, it restarts cloning the collection from the beginning after showing:

_2015-06-05T16:17:37.683+0000 W REPL     [ReplicationExecutor] Failed to load timestamp of most recently applied operation; NoMatchingDocument Did not find any entries in local.oplog.rs_

This server has only 16GB RAM, and can do a sync with mmapv1 without any problem. The database size is 946GB on mmapv1, and 243GB on WiredTiger (snappy).

I started running the following command when the cloning was at 236M docs. I attached the result.

mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(10*1000)}" > ss.log

Cloning finishes at 2015-06-05T16:09:38.003+0000.
Index creation starts at 2015-06-05T16:09:45.606+0000.

The last log before the kill:

2015-06-05T16:12:00.565+0000 I -        [rsSync]   Index Build: 13494400/292298635 4%

On another replica set, this migration worked fine with a database size of 402GB mmapv1 / 133GB WT. For the record, the sync also took all the RAM, but index creation went fine. What's interesting is that after a restart of mongod on this server, RAM usage went from 100% to 17%. After a few days of continuous writes from the oplog, it's now at 43% RAM. I did not launch any query or make any backup from this secondary.

The two replica sets contain the same kind of documents, and the same indexes.

My storage configuration:

storage:
  dbPath: /a/b/c
  indexBuildRetry: true
  journal:
    enabled: true
  directoryPerDB: false # default
  syncPeriodSecs: 60 # default
 
  engine: wiredTiger
 
  wiredTiger:
    engineConfig:
#      cacheSizeGB: 12 # Default: the maximum of half of physical RAM or 1 gigabyte.
      statisticsLogDelaySecs: 0 # If set to 0, WiredTiger does not log statistics.
      journalCompressor: snappy # Default: snappy
      directoryForIndexes: false # Default: false
    collectionConfig:
      blockCompressor: snappy # Default: snappy
    indexConfig:
      prefixCompression: true # Default: true

There's this storage.wiredTiger.engineConfig.cacheSizeGB setting. I don't really get what it means. I tried setting the default, or setting it at 14. RAM usage still goes beyond the server capacity.

This issue could be related to SERVER-17456 and SERVER-17424. I created a new issue, because this one happens during an initial sync, not because of benchmarks.

In case someone at MongoDB is interested, the servers are monitored by MMS:

  • The failing group is 556dc2ffe4b0ba305ae49354. The hidden Secondary is the only one running on WiredTiger. It's the one that I try to sync again and again.
  • The ok group is 556dce19e4b0eea5366e6ae5. The hidden Secondary is the only one running on WiredTiger.


 Comments   
Comment by Ramon Fernandez Marina [ 20/Jun/15 ]

Thanks for reporting back nicolas_, and glad to hear that you were able to complete the initial sync. This problem with memory consumption is addressed in SERVER-18829 and will be part of MongoDB 3.0.5, scheduled for the coming weeks.

If you upgrade to 3.0.5 when it comes out, please note that if you have 16GB of memory it may be too aggressive to let mongod use 14GB for cache, as that may have a negative impact on other parts of the system that need memory (including filesystem cache and mongod itself), so I'd still recommend you let WiredTiger choose the size of the cache.

I'm closing this ticket as a duplicate of SERVER-18829. Feel free to watch that ticket for updates on when it gets merged into the MongoDB tree.

Regards,
Ramón.

Comment by Nicolas Fouché [ 14/Jun/15 ]

I attached new graphs showing memory usage during the last days.
Despite continuous upserts on a DB which is 10x the size of the RAM, memory usage is steady.
So, I guess I have to set a small value first, and then grow it incrementally until memory usage is steady at 80% of the total RAM.

Comment by Nicolas Fouché [ 10/Jun/15 ]

I attached a new screenshot of Cache Usage and Activity.
I see what cacheSizeGB is for. It's capped at 1GB, it does not lie

Comment by Nicolas Fouché [ 10/Jun/15 ]

ramon.fernandez, it worked with cacheSizeGB: 1.

Here are log excerpts of the sync:

2015-06-09T15:17:49.194+0000 I REPL     [rsSync] initial sync clone all databases
2015-06-09T22:50:24.279+0000 I STORAGE  [rsSync] clone rum.reports 294194175
2015-06-09T22:50:56.879+0000 I INDEX    [rsSync] build index on: rum.reports properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "rum.reports" }
2015-06-09T22:50:56.879+0000 I INDEX    [rsSync] building index using bulk method
2015-06-09T22:50:59.000+0000 I -        [rsSync]   Index Build: 370200/294505784 0%
2015-06-10T04:21:08.002+0000 I -        [rsSync]   Index Build: 294551500/294564782 99%
2015-06-10T04:21:19.000+0000 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 17125700/294564782 5%
2015-06-10T04:23:59.000+0000 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 278710300/294564782 94%
2015-06-10T04:24:07.880+0000 I INDEX    [rsSync]   done building bottom layer, going to commit
2015-06-10T04:24:10.977+0000 I INDEX    [rsSync] build index done.  scanned 294564782 total records. 6359 secs
2015-06-10T04:24:11.287+0000 I REPL     [rsSync] oplog sync 3 of 3
2015-06-10T04:34:35.329+0000 I REPL     [rsSync] initial sync finishing up
2015-06-10T04:34:35.329+0000 I REPL     [rsSync] replSet set minValid=5577bbe4:1f
2015-06-10T04:34:35.354+0000 I REPL     [rsSync] initial sync done
2015-06-10T04:34:35.396+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2015-06-10T04:34:35.397+0000 I REPL     [ReplicationExecutor] transition to SECONDARY

I attached a screenshot of RAM usage over time.
To recap:

  • Cloning starts at 15:17.
  • Index build starts at 22:50.
  • Btree Bottom Up starts at 04:21. (little spike for Virtual)
  • Oplog sync starts at 04:24.
  • Becomes secondary at 04:34.
  • Restarting mongod at 08:22. I did that just to check how RAM usage will change.

I also did an ss.log, would you like me to share it ?
In case you wanna check on MMS, I did the sync on a temporary server at https://mms.mongodb.com/host/detail/556dc2ffe4b0ba305ae49354/e4579de15068a3f52d5cef56e82d9977 .

So... questions:

  • Resident Memory went as high as 2.03GB, how is this value related to cacheSizeGB (which is at 1GB) ?
  • I still don't know if RAM usage will still go higher and higher over time, until mongod gets killed by the kernel.
  • If RAM usage won't go crazy thanks to cacheSizeGB, how can I determine its highest value for optimal performance ?
  • All in all, this killed-by-kernel thing is really scary. How could I be confident with WiredTiger knowing this can happen ? Should I supervise the RAM usage of mongod and restart when it reaches 90% ? You guess no-one wants to do that.
Comment by Ramon Fernandez Marina [ 05/Jun/15 ]

nicolas_, you may be running into SERVER-18829. Can you please try lowering the size of the WT cache to, say, 1GB and see if the initial sync completes?

Thanks,
Ramón.

Generated at Thu Feb 08 03:48:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.