[SERVER-31421] MongoDB crashes with "Out of memory" Created: 05/Oct/17  Updated: 16/Nov/21  Resolved: 12/Feb/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.6
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Tanveer Madan Marate Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diag.tgz     File diagnostic.data.tgz     File messages.syslog     Text File onecop.log     File onecop1_Oct3.log.tgz    
Participants:

 Description   

MonogDB crashes with "Out of memory"

Tue Oct  3 10:08:08.566 I COMMAND  [conn13175] command admin.$cmd command: moveChunk { moveChunk: "fuji.7z030", shardVersion: [ Timestamp 17000|3, ObjectId('59cb39c5ea60c39c85b0a58c') ], epoch: ObjectId('59cb39c5ea60c39c85b0a58c'), configdb: "onecopc/xsj-dvdbone07:27040", fromShard: "onecop1", toShard: "onecop2", min: { stdfid: -7776614685089660447 }, max: { stdfid: -7766145972179898906 }, chunkVersion: [ Timestamp 17000|3, ObjectId('59cb39c5ea60c39c85b0a58c') ], maxChunkSizeBytes: 67108864, waitForDelete: false, takeDistLock: false } numYields:5742 reslen:437 locks:{ Global: { acquireCount: { r: 11493, w: 3 } }, Database: { acquireCount: { r: 5745, w: 3 } }, Collection: { acquireCount: { r: 5745, W: 3 } } } protocol:op_command 1223ms
Tue Oct  3 10:08:08.578 I SHARDING [conn13175] request split points lookup for chunk fuji.7z030 { : -7776614685089660447 } -->> { : -7766145972179898906 }
Tue Oct  3 10:08:08.618 F -        [conn13626] out of memory.
 
 0x7f4200367a41 0x7f4200367074 0x7f42002d5001 0x7f41ff67d0d5 0x7f4200111595 0x7f41ff8620c1 0x7f41ff863bf1 0x7f41ffe7bef0 0x7f41ffa81d68 0x7f41ff67fd4d 0x7f41ff68067d 0x7f42002cf981 0x7f41fda41df5 0x7f41fd76f1ad
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"7F41FEDF6000","o":"1571A41","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7F41FEDF6000","o":"1571074","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"7F41FEDF6000","o":"14DF001","s":"_ZN5mongo12mongoReallocEPvm"},{"b":"7F41FEDF6000","o":"8870D5","s":"_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi"},{"b":"7F41FEDF6000","o":"131B595","s":"_ZN5mongo3rpc19CommandReplyBuilder22getInPlaceReplyBuilderEm"},{"b":"7F41FEDF6000","o":"A6C0C1","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"7F41FEDF6000","o":"A6DBF1","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"7F41FEDF6000","o":"1085EF0","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"7F41FEDF6000","o":"C8BD68","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"7F41FEDF6000","o":"889D4D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"7F41FEDF6000","o":"88A67D"},{"b":"7F41FEDF6000","o":"14D9981"},{"b":"7F41FDA3A000","o":"7DF5"},{"b":"7F41FD679000","o":"F61AD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.6", "gitVersion" : "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.2.2.el7.x86_64", "version" : "#1 SMP Wed Nov 16 13:15:13 EST 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "7F41FEDF6000", "elfType" : 3, "buildId" : "60D7165C62663B8B732487CBB271FFEC7B04C8A6" }, { "b" : "7FFF945E1000", "elfType" : 3, "buildId" : "5B9ED38E31CE6BD04ECCA183AD6D6EE05A4535D0" }, { "b" : "7F41FE964000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "8B4A33094EA982F927F4D5F84059EB073A203DB5" }, { "b" : "7F41FE57A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7455CBD6F62579DA1598F1DC123F039F25466C90" }, { "b" : "7F41FE372000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "DE43F3E59399601F2D40096B54D05FFA065F6DDA" }, { "b" : "7F41FE16E000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "7E9A1A8B08DB426D5E349DB8B2D11B8BC0442477" }, { "b" : "7F41FDE6C000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "4DB1EDF3A02BB05820F9DAB5DF06A324FA09FF54" }, { "b" : "7F41FDC56000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3D06B234BB28280F8B45C2A3B76DBFD9986FC7F5" }, { "b" : "7F41FDA3A000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "505287E8736961F603241188C0319B217005E7CF" }, { "b" : "7F41FD679000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "749D3F9CD5D026324C52F8BF2E8037B6A18AFEB7" }, { "b" : "7F41FEBD2000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "92EA16F0331C16FC99B67B6643A50E4C7E21FAAC" }, { "b" : "7F41FD42D000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "511A7F1757F2ABE2894651589D91269DAB895B86" }, { "b" : "7F41FD14A000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "77B09D93C29E0D455D45790DE86303C07B003035" }, { "b" : "7F41FCF46000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "B25574847B066A26CD593C8101DF6779898FF2C2" }, { "b" : "7F41FCD14000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "6CD647AAA0631C7AF590B436DA74A6F373B5D6BB" }, { "b" : "7F41FCAFE000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "8934632E74819BCC23A16BD5659F1FFBB5243D93" }, { "b" : "7F41FC8EF000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "88AECC78BED1C6909C11C23C091DDDAC69BB1D30" }, { "b" : "7F41FC6EB000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, { "b" : "7F41FC4D1000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "783A909BC6E9945180505C80B3A01D2D518C7B5F" }, { "b" : "7F41FC2AA000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "6A5EF7A05F7E488FCD280BFADD96083BEC9FD416" }, { "b" : "7F41FC049000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "48073BD2BFFD1255A1AAB572CA1C3DC53AF5CD2A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7f4200367a41]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x7f4200367074]
 mongod(_ZN5mongo12mongoReallocEPvm+0x21) [0x7f42002d5001]
 mongod(_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi+0x55) [0x7f41ff67d0d5]
 mongod(_ZN5mongo3rpc19CommandReplyBuilder22getInPlaceReplyBuilderEm+0x35) [0x7f4200111595]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0xB1) [0x7f41ff8620c1]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF81) [0x7f41ff863bf1]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x7f41ffe7bef0]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD38) [0x7f41ffa81d68]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x7f41ff67fd4d]
 mongod(+0x88A67D) [0x7f41ff68067d]
 mongod(+0x14D9981) [0x7f42002cf981]
 libpthread.so.0(+0x7DF5) [0x7f41fda41df5]
 libc.so.6(clone+0x6D) [0x7f41fd76f1ad]
-----  END BACKTRACE  -----



 Comments   
Comment by Ramon Fernandez Marina [ 12/Feb/18 ]

This ticket has been dormant for a while so I'm going to close it.

For further MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Comment by Bruce Lucas (Inactive) [ 26/Oct/17 ]

Hi Tanveer,

The situation you describe:

  • mongod resident memory is a little more than configured cache
  • mongod virtual memory is quite a bit larger

is normal and expected. The resident memory is actual physical memory in use by mongod, whereas virtual memory is just address space, only the resident portion of which is backed by real memory. The reason this situation arises has to do with the way the allocator manages memory. Every installation of mongod will show this characteristic, and I am not aware of any circumstances where a large virtual memory causes problems - it is the resident memory that is the potentially scarce resource.

The previous diagnostic data ended a few seconds before the out of memory error crash (because the diagnostic data is only flushed to disk every few seconds, for performance reasons), so we couldn't be sure that mongod hadn't allocated a lot of memory in that few seconds.

The new set of mongod data has the same characteristic, however the syslog shows that some other applications were also showing out-of-memory errors:

Oct 19 14:17:16 hostname puppet-agent[41021]: (/Stage[main]/Automic/Service[wlaapps]) Could not evaluate: Cannot allocate memory - fork(2)
Oct 19 14:17:16 hostname puppet-agent[41021]: Puppet::Util::FileType::FileTypeCrontab could not read root: Cannot allocate memory - crontab -l 2>/dev/null
Oct 19 14:17:16 hostname puppet-agent[41021]: Failed to apply catalog: Puppet::Util::FileType::FileTypeCrontab could not read root: Cannot allocate memory - crontab -l 2>/dev/null

Crucially, these errors occurred a few minutes before mongod crashed at 14:21, and the mongod diagnostic data shows that it is not using excessive resident memory at that point.

Given the above, I don't see any evidence of a bug in mongod that is causing of the out-of-memory errors. However unfortunately I also can't identify what is the cause. If you have the Linux OOM killer enabled, possibly it will be activated the next time this event occurs, and if that is the case it will collect detailed memory statistics and write them to syslog. If you are able to collect that information we could take a look to get a better idea of what's going on.

Thanks,
Bruce

Comment by Tanveer Madan Marate [ 24/Oct/17 ]

Hi Bruce,

I have uploaded the required files.
When I mean same symptoms, below are the observations from our side
onecop.log
1. there was read stress test in progress with 100 sessions
2. WT cache was set to 120GB out the 256GB available on the host using the parameter cacheSizeGB
3. we are using telgraf to monitor the mongodb, and find that the resident memory was around 145GB but the virtual memory was 230GB
4. When checking the mongod process using top command it depicts the same as point 3.

onecop.log messages.syslog

Hope this clarification helps. I am trying to understand the memory usage ti restrict the use of memory we limited the WT cache to 120GB but even after that the mongod process seems to be using high memory

Thanks,
Tanveer

Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ]

Thanks Tanveer.

Can you please also attach the mongod log file showing the crash, and clarify what you mean when you said it crashed with the same symptoms?

Also can you please attach the syslog file covering the latest crash?

Thanks,
Bruce

Comment by Tanveer Madan Marate [ 24/Oct/17 ]

Hi Bruce,

Thanks for the response !!!
Attached is the diagnostic.data for the crash this time round
Unfortunately I do not have the syslog for the previous crash diag.tgz

Thanks,
Tanveer

Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ]

Hi Tanveer,

Can you please archive the diagnostic.data directory and attach to this ticket, along with the log file showing the crash? I would like to check whether all the circumstances are the same.

Also, if you still have the syslog file showing the OOM details for the original incident can you attach that? It will have further information about memory usage.

Thanks,
Bruce

Comment by Tanveer Madan Marate [ 23/Oct/17 ]

Hi Bruce,

The database again crashed today with identical symptoms, exception being OOM was not invoked this time
Can you please suggest any fixes that we can use to avoid this issue?

Thanks,
Tanveer

Comment by Tanveer Madan Marate [ 05/Oct/17 ]

Hi Bruce,

Thanks for your reply !!!
There is no other mongod process running on the server, it runs only one mongod
Also we have set the WT cachesize to 120GB

Below is the limits for the mongod process (Note I restarted the mongod after the crash)

bash-4.2$ cat /proc/$(pidof mongod)/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             1024000              1024000              processes
Max open files            1024000              1024000              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       1031075              1031075              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Thanks,
Tanveer

Comment by Bruce Lucas (Inactive) [ 05/Oct/17 ]

Hi Tanveer,

Thanks for providing the diagnostic.data and logs.

I'm not finding an indication that mongod was using an excessive amount of memory. The last data point recorded in diagnostic.data was at 10-03 17:08:01 UTC, about 7 seconds before the OOM. At this point mongod was using 150 GB resident memory, well short of the 256 GB of memory on the machine.

It was using about 230 GB of virtual memory (78 GB of which was unmapped, leaving only the 150 GB resident that I mentioned above). Can you check the ulimits for mongod to make sure that is not limiting the virtual memory size of mongod? The best way to do this is

cat /proc/$(pidof mongod)/limits

which will check the limits in effect for the actual running mongod process. If you post the results here we can check whether process memory limits are the cause of the issue.

Also, can you confirm that there is nothing else running on the machine consuming memory, for example another mongod process? If that is the case you will need to adjust the cache limit used by mongod correspondingly.

Thanks,
Bruce

Generated at Thu Feb 08 04:27:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.