[SERVER-22000] MongoDB version 3.2; WiredTiger is killed by OOM killer Created: 23/Dec/15  Updated: 14/Apr/17  Resolved: 08/Feb/16

Status: Closed
Project: Core Server
Component/s: Stability, WiredTiger
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Amanpreet Singh Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: WTmem, WTplaybook
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cursors.png     PNG File cursors2.png     PNG File memory.png     PNG File mms_stats.png    
Issue Links:
Duplicate
duplicates SERVER-19966 Excessive cursor caching in integrati... Closed
Operating System: ALL
Participants:

 Description   

I just upgraded Mongo 3.0.8 to 3.2 and changed storage engine to WiredTiger. Now primary of my replica set crashes almost everyday.

There's no error in mongo log. Tailing the dmesg shows this -

[1328492.617946] [30005]   500 30005    30032     2611      41       3        0             0 mongo
[1328492.622599] Out of memory: Kill process 22659 (mongod) score 970 or sacrifice child
[1328492.626892] Killed process 22659 (mongod) total-vm:32870580kB, anon-rss:30431520kB, file-rss:0kB

The nproc, ulimit, etc. values are set. The instance has 30 GBs of RAM and same instance used to run mongo with MMapv1 without any issues.



 Comments   
Comment by Ramon Fernandez Marina [ 14/Apr/17 ]

pieterwjordaanpc@gmail.com, the graphs are generated by internal tools. You can find information about FTDC implementation here.

Comment by Pieter Jordaan [ 14/Apr/17 ]

Though not relevant, how are the graphs generated from the diagnostic data? I've seen many jira entries showing them but don't know how to create them.

Comment by Ramon Fernandez Marina [ 08/Feb/16 ]

amanpreet@codigami.com, we believe you're running into SERVER-19966:

At A, B, and C we see jumps in wt "open cursor count", and at those points or shortly thereafter we see the jumps in "current_allocate_bytes" that account for a large part of the excess.

I'd recommend you review the places in your application that need to open that many cursors and see if they can be cleaned up after use. Depending on your use case, an alternative may to lower the idle cursor timeout as describe in SERVER-8188.

I'm going to resolve this ticket as a duplicate of SERVER-19966, feel free to watch it for updates.

Regards,
Ramón.

Comment by Tomasz Pajor [ 27/Jan/16 ]

Actually if you could do this with 3.2.1 it would be great.

Comment by Amanpreet Singh [ 27/Jan/16 ]

Okay. Will do that.

Comment by Tomasz Pajor [ 27/Jan/16 ]

Sad to hear that I would suggest lowering the cache size to 8GB and precisely monitor memory. If you could provide memory stats after a week or so after you lowered it it would be very helpful.

Comment by Amanpreet Singh [ 27/Jan/16 ]

Just now, it crashed with 15 GB cache size as well This was running mongod 3.0.8

Comment by Amanpreet Singh [ 26/Jan/16 ]

Hi Tomasz,

I reduced cache_size from 17G to 15 GB. I could not test with 3.2.1 yet, since it's a prod db and I can't afford to see it having problems again.

Comment by Tomasz Pajor [ 25/Jan/16 ]

Hello,

Does this still happen with 3.2.1? How drastically you had to reduce cache_size?

Comment by Amanpreet Singh [ 25/Jan/16 ]

Hi,

Any update on this? I see the status is still "debugging with submitter". I have downgraded mongod to 3.0.8 and reduced the WiredTiger cache_size to deal with this issue for now.
I don't have any other info to add.

Comment by Amanpreet Singh [ 11/Jan/16 ]

Yes, that's correct. It's running only mongod process.

Comment by Tomasz Pajor [ 11/Jan/16 ]

This is just a suggestion from my end, I don't know if this will fix your issue. So on this instance you are only running mongo server, nothing else that might consume the resources?

Comment by Amanpreet Singh [ 11/Jan/16 ]

The relevant log output is:
wiredtiger_open config: create,cache_size=17G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),

This instance in r3.xlarge on AWS. It has ~30 GB memory. But I'm surprised that if this is the issue, why is the default value set to be so?

Comment by Tomasz Pajor [ 11/Jan/16 ]

From documentation (https://docs.mongodb.org/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB)

Changed in version 3.2: With WiredTiger, MongoDB utilizes both the filesystem cache and WiredTiger cache. By default, starting in MongoDB 3.2, the WiredTiger cache will use 60% of RAM minus 1 GB or it will use 1 GB, whichever is larger. For systems with up to 10 GB of RAM, this is less than or equal to the 3.0 setting. For systems with more than 10 GB of RAM, the configuration is greater than the 3.0 setting.

For me it helped to lower it from the 60% to 8gb, cause based on the 60% it was setting it to 18gb. Can you show me the log output, please look for a line like "wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=sna
ppy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0)"

Comment by Amanpreet Singh [ 10/Jan/16 ]

Hi Tomasz,

How do I verify if it's due to cacheSizeGB?

I haven't manually set the cacheSizeGB value. The instance is running just one mongod instance and nothing else. Shouldn't the default value be okay?

Comment by Tomasz Pajor [ 10/Jan/16 ]

Can you please verify if this is not happening due to cacheSizeGB? I had similar problem and changing it fixed it for me.

Comment by Amanpreet Singh [ 31/Dec/15 ]

Hi Dan,

rs.status() doesn't show any information about protocolVersion. But I believe it would be version 0, since I just upgraded mongod from 3.0.8 to 3.2 on each member one-by-one and replica-set was initialized before version 3.2.

Comment by Daniel Pasette (Inactive) [ 31/Dec/15 ]

You can check by calling rs.status() in the mongo shell and checking for the protocol version in the output. If you have simply upgraded, or using a replica set with mixed versions, then you are using protocol version 1.
See: https://docs.mongodb.org/manual/release-notes/3.2/#rel-notes-rs-enhancements

Thanks for the additional information, we are continuing to investigate the issue.

Comment by Amanpreet Singh [ 29/Dec/15 ]

Hi Bruce,

I'll try to answer the questions as much as possible:

  • I'm not using SSL.
  • I'm not using authentication.
  • Cluster is a replica set (not sharded) consisting of 1 primary and 2 secondaries, each having priority:1 and votes:1.
  • I'm attaching an image of MMS stats for last 12 hours during which there were crashes on two different members. Middle one is version 3.0.8 which crashed for the first time last night and others are version 3.2 which crash a few hours after they become the primary member. In case you don't know, red vertical bar indicates a server restart (crash in my case) and orange vertical bar indicates the server is now a primary.

  • I don't see any info about replication protocol in rs.conf(). Let me know if there's any other way to check it.
  • We're doing insert, update, query.

Thanks. Let me know if you need anything else.

Best,
Aman

Comment by Bruce Lucas (Inactive) [ 28/Dec/15 ]

Hi Aman,

Understood. We may have identified another system internally that is showing similar behavior. For comparison purposes can you tell us generally about your setup:

  • Are you using SSL?
  • Are you using authentication, and if so what kind?
  • What is your cluster topology - how many secondaries, what roles? Sharded?
  • Do you see similar memory growth on secondaries?
  • Which replication protocol?
  • Generally what kinds of operations are you doing - insert, update, query, aggregation, map/reduce, geo queries, full-text search, etc.?

Thanks,
Bruce

Comment by Amanpreet Singh [ 24/Dec/15 ]

Hi Bruce,

I'm afraid proving logs won't be possible since there's some operations logged which have sensitive data.
The company has strict policy regarding sharing of such data.

Best,
Aman

Comment by Bruce Lucas (Inactive) [ 24/Dec/15 ]

Hi Aman,

Thanks for providing that data. Here is a timeline of memory usage:

  • virtual memory climbs to about 32 GB, resident to about 29 GB, accounted for by a tcmalloc heap size of about 31 GB (all read from the "max" column)
  • but of that 31 GB heap size, only about 19 GB is actually allocated; most of the remainder is accounted for by about 11 GB of free memory in the tcmalloc page heap. This is a sign of likely memory fragmentation, a known issue tracked by SERVER-20306.
  • however of the 19 GB of allocated memory, only 14 GB is accounted for by the WT cache. The additional 5 GB of memory is an unexpectedly large amount of memory to find outside the WT cache. Moreover it is being held long-term, suggesting it has either been leaked, or is being held unexpectedly by some data structure.

We see increases in allocated memory at some specific times, for example around A and B above. I can't find anything in the stats to account for those increases, but perhaps if we look in the mongod log we will find some clues. Would you be able to upload the mongod log files corresponding to this timeframe (the mongod restart at around 15-21-21 08:00 UTC, and the following day or so)?

Thanks,
Bruce

Comment by Amanpreet Singh [ 24/Dec/15 ]

Hey Bruce,

I have uploaded the diagnostic.data tarball at the private link you provided. Let me know if you find anything

Thanks!

Best,
Aman

Comment by Bruce Lucas (Inactive) [ 23/Dec/15 ]

Hi Amanpreet,

I understand your concern. We don't at the moment have a viewer for the data that is ready for external use. However you can see the data that it collects here, consisting of the following, with the commands you can run from the mongo shell to see what data is collected:

serverStatus: db.serverStatus({tcmalloc: true})
replSetGetStatus: rs.status()
collStats for local.oplog.rs: db.getSiblingDB('local').oplog.rs.stats()
getCmdLineOpts: db.adminCommand({getCmdLineOpts: true})
buildInfo: db.adminCommand({buildInfo: true})
hostInfo: db.adminCommand({hostInfo: true})

If you are willing to share that information with us (with the assurance that we will keep it private) but don't wish to post it on this public ticket, you can upload it here.

Thanks,
Bruce

Comment by Amanpreet Singh [ 23/Dec/15 ]

Hi Bruce,

Thanks for adding the comment.

The files in diagnostic.data directory look like binary. Could you please tell me how to view this data? I don't want to unknowingly share any secret data.

Best,
Aman

Comment by Bruce Lucas (Inactive) [ 23/Dec/15 ]

Hi Amanpreet,

Can you please archive and attach to this ticket the $dbpath/diagnostic.data directory from the affected node? This will contain periodically collected serverStatus data, including memory statistics, that will help us diagnose the cause.

Thanks,
Bruce

Generated at Thu Feb 08 03:59:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.