[SERVER-46562] Shard Crashes with out of memory Created: 03/Mar/20  Updated: 27/Oct/23  Resolved: 31/May/20

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hermann Jaun Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2020-03-11-01-14-31-322.png     PNG File image-2020-03-30-14-40-23-844.png    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Hi

We face an issue when we do many map reduce operations on a sharded cluster.
The used memory increases more and more and when it is on the limit of the host memory, the mongod process crashes with an out of memry exception.

Environment:

  • 24 shards, 80 GByte RAM each
  • Default value for storage.wiredTiger.engineConfig.cacheSizeGB (no value set in the config)
  • Continuous parallel running map reduce operations on different input and output collections (Java applicaton with 10 threads)

We believe that the memory can be used by mongo up to the max availabe RAM. 
But the mongod process should never crash.

Many thank for a feedback!

 



 Comments   
Comment by Hermann Jaun [ 29/Apr/20 ]

Hi Dmitry
No, we have a much more stable situation after the new setting for storage.wiredTiger.engineConfig.cacheSizeGB.
I propose you close te ticket and we will re-open it in case we have again a server crash.

Thanks!

Hermann

 

Comment by Dmitry Agranat [ 28/Apr/20 ]

Hi hermann.jaun@comfone.com, did this issue reoccur again?

Comment by Dmitry Agranat [ 02/Apr/20 ]

Hi rui.ribeiro@comfone.com, I believe my answer is very similar to what was discussed in this comment. In short, if there is no "pressure" from the cache/memory pov, that index that you've mentioned should not be flushed. Of course, the LRU algorithm is not that simple but I just wanted to provide you with a very short and simplistic explanation. For such questions, I encourage you to ask our community by posting on the MongoDB Community Forums or on Stack Overflow with the mongodb tag.

Comment by Rui Ribeiro [ 30/Mar/20 ]

Hi
dmitry.agranat 
 
I work with Hermann and it is true that none of the server crashed.
 
But the memory used started to increase (indexes cached) with queries and map reduce. After the memory reached almost the limit, the servers didn't crashed, but everything (queries, map reduces, etc) was much more slower.
 
My question to you is very simple, I understand that MongoDB caches indexes so queries are faster, but if an index is not being used for the last X days, should MongoDB not release the memory cached for that index? In this case the only way to release that memory, was to restart all the nodes. Do we have a way (configuration) that MongoDB releases memory. Similar like in Java with the Garbage Collector .
 

 
Thank you.

Cheers
 
Rui

Comment by Hermann Jaun [ 30/Mar/20 ]

Ok,I agree.
Thanky you!

Hermann

Comment by Dmitry Agranat [ 30/Mar/20 ]

hermann.jaun@comfone.com, I suggest we'll wait for another 2 weeks in this ticket just in case this issue occurs again. If it doesn't, we'll close this ticket and will review new data in a new ticket if it reoccurs sometime in the future. Does it make sense?

Comment by Hermann Jaun [ 30/Mar/20 ]

Hi @Dmitry

None of the server did crash after setting the storage.wiredTiger.engineConfig.cacheSizeGB to 35 GByte.

I don't know if you want close this ticket and we open a new one as soon as we get the error again?

Best Regards,

Hermann

 

Comment by Dmitry Agranat [ 29/Mar/20 ]

Hi hermann.jaun@comfone.com,

Has the issue reoccured again since the last time you've initially reported it? If so, were you able to collect the requested information and upload it to the secure portal?

Thanks,
Dima

Comment by Dmitry Agranat [ 13/Mar/20 ]

Hi hermann.jaun@comfone.com,

The SERVER project is for bugs and feature suggestions for the MongoDB server. If you need further assistance with understanding how MongoDB manages memory, I encourage you to ask our community by posting on the MongoDB Community Forums or on Stack Overflow with the mongodb tag.

In short, OS manages allocation and deallocation of process memory and its OS which decides when/if to release memory, for example, when there is a "memory pressure". I assume this is not the case in your example. As for the MongoDB cache, we keep MongoDB data in the cache because it might be needed again, and it's most efficient to retrieve it from the cache. Releasing cached data back to the OS defeats the purpose of a cache. Also, please keep in mind that apart from the WT cache (which is 50% of the total RAM by default), there is memory we consume outside the WT cache (connections, in-memory sorts, buffers for intermediate data sets and sorting in aggregate pipelines, memory for de-duplicating results with $in or $or conditions, memory for scoring full text index results, cursors, plan cache...).

There is no parameter to release MongoDB memory on a running process.

Thanks,
Dima

Comment by Rui Ribeiro [ 11/Mar/20 ]

Hi @Dmitry Agranat

 

I would like also to understand how MongoDB manage the memory. Even if I am not running any operation the memory is never full released. As you can see in the graphic below from one of my shards. As you can see the memory is never full released to the initial value. Which parameters we can set, to force MongoDB to release all the memory when nothing is running?

Thank you.

Comment by Dmitry Agranat [ 05/Mar/20 ]

Hi hermann.jaun@comfone.com, looking at the mongod logs together with diagnostic.data is the key here to identify the root cause. If the issue happens again, please grab both and upload to the secure portal.

Comment by Hermann Jaun [ 05/Mar/20 ]

Hi Dmitry

I have uploaded the requested diagnostic date. In addition a screenshot showing the memory usage, when the crash happened on 1.3.2020 17:30 CET (UTC + 1).
The server runs in UTC timezone.

The Logs are unfortunately already deleted, we have too much transactions to keep them so long. But we will save the next time the respective log file.

I would like to mention, that we meanwhile set on in the meantime on 3.3.2020 the parameter  storage.wiredTiger.engineConfig.cacheSizeGB to 35 GByte.
It looks like ** since then the memory consumption stays much lower than before (same load). It is since 2 days quite stable at around 58 GByte.
But we have to observe the behaviour for a longer period.
Can this be the solution?
For what is Mongo server using the RAM above the cache size?

Best Rgards,
Hermann

 

 

Comment by Dmitry Agranat [ 04/Mar/20 ]

Hi hermann.jaun@comfone.com,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from the server in question and upload them to this support uploader location?

Please also clarify the exact time and the time zone of the reported issue.

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thank you
Dima

Comment by Hermann Jaun [ 03/Mar/20 ]

Hi Dmitry

Hereafter the additional info:

  • MongoDB server version 4.0.16
  • We use 1 mongod process only on each server
  • No other applications on the server (standard Linux installation)

 

Comment by Dmitry Agranat [ 03/Mar/20 ]

hermann.jaun@comfone.com, what MongoDB version do you use and how many mongod processes each server runs?

Generated at Thu Feb 08 05:11:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.