[SERVER-40595] memory usage saturation Created: 11/Apr/19  Updated: 11/Sep/19  Resolved: 11/Sep/19

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.6.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Hughes Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-06-06 at 11.38.22 AM.png     PNG File Screen Shot 2019-06-06 at 11.41.14 AM.png     PNG File Screen Shot 2019-06-06 at 11.42.27 AM.png     PNG File Screen Shot 2019-07-16 at 11.25.29 AM.png    
Operating System: ALL
Participants:

 Description   

We had a production system that has been working fine slow to a crawl and kick users out after the memory usage grew to 99% overall. 

We are currently on 3.6.10 

Single server, no replica set

Config has wiredTiger.cacheSizeGB: 13

Total system memory is 24gb

values reported by mongostat were

vsize  40gb

res was 19

dirty .8%

used 80%

I have the metrics files from the diagnostic.data for the time frame, are there tools available yet so I can analyze these?



 Comments   
Comment by Danny Hatcher (Inactive) [ 11/Sep/19 ]

It is possible that there is a bug with the way our Windows memory metrics are reporting but as you've said that's a little beyond the scope of this particular ticket. If you think you can reproduce the incorrect reporting manually then I recommend you open a new ticket where we can dig in there. For now I'll close this ticket.

Comment by Matt Hughes [ 04/Sep/19 ]

The oddity is though that the mongod process is not showing as it is the one using the memory which then clears when mongod is stopped, which I guess at this point is outside of the scope of this jira. We only run mongod as the sole process on a windows server and from what ive seen you cannot limit memory usage per process very well in windows. We havent seen a system completely crash like the original one reported since moving on to 3.6.12 and 13 so i suppose  this can be closed. Its peculiar that the mongod process doesnt report all of the memory being used when looking at Windows task manager or perfmon.

Comment by Danny Hatcher (Inactive) [ 04/Sep/19 ]

mhughes have you had a chance to read my previous comment?

Comment by Danny Hatcher (Inactive) [ 06/Aug/19 ]

While MongoDB will utilize the --wiredTigerCacheSizeGB setting for WiredTiger itself, the server will also utilize the filesystem cache to keep compressed data files in memory as well as for other operations. Thus, it is expected and normal for MongoDB to eventually grow and acquire all free memory on a system. For this reason, virtual memory is not an important aspect in diagnosing MongoDB issues. We recommend running a mongod process as the sole significant process on any given server (or restrain resource usage via OS settings) so that other programs do not interfere with it running.

That being said, we do not expect resident memory to rise high enough to cause out-of-memory issues except for extreme situations. If you think you are running into memory issues (other than virtual memory rising high for the process), please upload the diagnostics to the Support Uploader link you were provided earlier along with specific timestamps and we can try to see if there is a bug present.

Comment by Matt Hughes [ 31/Jul/19 ]

Windows task manager, resource monitor, perfmon, and the image above. If we have the wiredtiger cache limited at 13gb why is the virtual memory usage showing so much higher? I have a site I recent added a secondary member to converting to areolica from a standalone it had 1gb free out of 34gb the day after the secondary got in sync and there was only 18gb reported in perfmon with nothing else showing it as used. Is this just normal behaviour then? Just trying to understand why mongod process is only showing a smaller percentage of usage in the os and nothing else has it allocated, stop mongo and the extra memory usage goes away.

Comment by Danny Hatcher (Inactive) [ 30/Jul/19 ]

mhughes, have you had a chance to review eric.sedor's latest comment?

Comment by Eric Sedor [ 16/Jul/19 ]

Thanks mhughes; Can you please clarify exactly how you are measuring free memory?

For reference, here is memory use as reported in the provided diagnostic data. We are going to look for anything unusual but ideally we will need to see a specific incident to determine if a given amount of memory use is the result of a bug.

Comment by Matt Hughes [ 15/Jul/19 ]

We havent had any sites use up all memory lately but I uploaded one from a site with some memory that couldnt be tied to any specific process. They currently seem stable but there is only 1-2gb free and seems to be hanging around there.

Comment by Eric Sedor [ 11/Jul/19 ]

Hi,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide diagnostic data, logs, and timestamps for additional occurrences of the behavior you are seeing?

Thank you!

Comment by Eric Sedor [ 18/Jun/19 ]

Hi mhughes, even though we can't clearly point to where a memory limitation is being reached, we do see some large swings in session counts and we recommend patching to 3.6.12 to obtain the fix for SERVER-39932.

So far, we have not been able to confirm that MongoDB was running out of memory, as resident memory was only 19GB out of the available 24 GB of RAM. But we do see growing memory use and think it is related to an increase in aggregation operations, which would be responsible for memory use outside of the WiredTiger cache.

Have you seen other occurrences like this that you could provide diagnostic data for?

Comment by Eric Sedor [ 06/Jun/19 ]

Thanks mhughes; we're analyzing these files and will get back to you with our thoughts and any questions we have.

Comment by Matt Hughes [ 30/May/19 ]

Files have been uploaded. It looks like I neglected to mention previously that all of our applications had been stopped, they had to be because nothing was responsive, to get memory back the mongo service had to be restarted

Comment by Eric Sedor [ 29/May/19 ]

Hi,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the requested diagnostic data?

If you still have privacy concerns, you may feel more comfortable using this support loader endpoint. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Eric

Comment by Eric Sedor [ 12/Apr/19 ]

The contents of the directory are described here.

Comment by Matt Hughes [ 12/Apr/19 ]

Posted here because I couldn't see any reason why there was so much more memory usage by mongo past what the wiredTiger cache was set to in config and I don't see any tools available anywhere to analyze the diagnostic files. I can post them but can you confirm they have no data about what's stored in the database inside of them? Our data contains patient health information and if it contains data I cannot post them publicly.

Comment by Eric Sedor [ 11/Apr/19 ]

Hi mhughes,

Do you have reason to suspect a bug in the MongoDB server? The SERVER project is for bugs and feature suggestions for the MongoDB server. If you archive (tar or zip) the $dbpath/diagnostic.data directory (described here) we can examine it for the purposes of determining if the issue is a bug.

However, a better place to start given seemingly poor performance or high memory use for a particular workload is to engage the MongoDB community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Does this make sense?

Generated at Thu Feb 08 04:55:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.