[SERVER-19917] MongoDB crashed while loading bulk data Created: 13/Aug/15  Updated: 25/Aug/15  Resolved: 25/Aug/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Venkatesh Sankar Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Mongo_3_0_5_bload_stats.txt    
Issue Links:
Duplicate
Operating System: ALL
Participants:

 Description   

MongoDB getting crashed frequently ( once in an hour) . I am running Single node Mongo DB 3.0.4 with Wired tiger on 3.14.26-24.46.amzn1.x86_64.

The log says out of memory . But the instance has 50 % disk space.

2T15:54:11.419+0000 F -        [conn2833] out of memory.
 
 0xf7a0e9 0xf79c19 0xefcef2 0xf2a843 0xf2d1f7 0x7f0ff2848df3 0x7f0ff12fc1ad
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B7A0E9"},{"b":"400000","o":"B79C19"},{"b":"400000","o":"AFCEF2"},{"b":"400000","o":"B2A843"},{"b":"400000","o":"B2D1F7"},{"b":"7F0FF2841000","o":"7DF3"},{"b":"7F0FF1206000","o":"F61AD"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "3.14.26-24.46.amzn1.x86_64", "version" : "#1 SMP Wed Dec 10 10:02:43 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7CCB6364CD6044656089121D1BD461EAD6D71352" }, { "b" : "7FFF760FE000", "elfType" : 3, "buildId" : "619DC2243CAEDC45A9CBC6932D90C3643318760A" }, { "b" : "7F0FF2841000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F0FF25D4000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F711D67FF0C1FE2222FB003A30AB74DA26A5EF41" }, { "b" : "7F0FF21EF000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "777069F5EECC26CD66C5C8390FA2BF4E444979D1" }, { "b" : "7F0FF1FE7000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F0FF1DE3000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F0FF1ADF000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F0FF17DD000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F0FF15C7000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F0FF1206000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "A14FC690F08FB799BA8CC82D49DE9AA9D4580464" }, { "b" : "7F0FF2A5D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F0FF0FC3000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "72C1DB5E2447A90D1BF34065BCC031B7263FFBAC" }, { "b" : "7F0FF0CDE000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "2B8787E8E0C317CF820E5D830D923BC744E497F4" }, { "b" : "7F0FF0ADB000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F0FF08B0000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7F0FF069A000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "87B4EBF2183C8EA4AB657212203EFFE6340E2F4F" }, { "b" : "7F0FF048F000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7F0FF028C000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F0FF0072000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F0FEFE51000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "803D7EF21A989677D056E52BAEB9AB5B154FB9D9" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf7a0e9]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf79c19]
 mongod(_ZN5mongo11mongoMallocEm+0x22) [0xefcef2]
 mongod(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x143) [0xf2a843]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x327) [0xf2d1f7]
 libpthread.so.0(+0x7DF3) [0x7f0ff2848df3]
 libc.so.6(clone+0x6D) [0x7f0ff12fc1ad]

Any help would be much appreciated.



 Comments   
Comment by Ramon Fernandez Marina [ 25/Aug/15 ]

vengireturns@gmail.com, this machine is seriously underpowered to run MongoDB. For starters, the default WiredTiger cache size is 1GB, so it's not surprising mongod run quickly out of memory.

You may be able to get things to work by lowering the WiredTiger cache considerably; for example, here's the command line argument to set it to 100M:

--wiredTigerEngineConfigString="cache_size=100M"

Needless to say there's a performance tradeoff, so you may want to consider a machine with a lot more memory.

I'm closing this ticket as I don't see an evidence of a bug, and we keep the SERVER project for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Comment by Venkatesh Sankar [ 25/Aug/15 ]

Ramon,
The instance has 1 GB memory and 30GB disk space . But no swap memory was configured.. And there is no cgroup has been configured.

Our development team is loading the data from Spring data repository using a java program (12 million documents in chunks of 10000 each). So i couldn't share the data to you.

plz let me know if you need any details.

Comment by Venkatesh Sankar [ 25/Aug/15 ]

output from ulit -a:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 7826
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 64000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 32000
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Ramon Fernandez Marina [ 25/Aug/15 ]

Hi vengireturns@gmail.com, apologies for the radio silence. The mongostat output you sent shows an increase in memory usage, but should not be enough to trigger this issue. Can you please tell us:

  • how much memory you have in this machine
  • the output of "ulimit -a"
  • whether you have cgroups configured

Also, what kind of bulk loading are you doing and how? Which driver are you using? I'm asking because if you provide enough details, even share the data you're uploading[1], we can try to reproduce locally.

Thanks,
Ramón.

[1] If you can upload your data let me know and I'll create an upload portal for you; JIRA only supports 150MB uploads.

Comment by Venkatesh Sankar [ 24/Aug/15 ]

Ramon, Can you please update me on this ?

Comment by Venkatesh Sankar [ 14/Aug/15 ]

Herewith am attaching the mongostat details from the start of the load till it crashed.

Comment by Venkatesh Sankar [ 14/Aug/15 ]

Hi Ramon,

Thanks for your quick response. I did tested the load again in 3.0.5. It performed well this time. However at the end of the load , it failed again with memory issue. below the error for your reference.

2015-08-14T15:04:42.547+0000 E STORAGE  WiredTiger (12) [1439564682:514528][7195:0x7f86ccb52700], file:collection-4--6473704670305184383.wt, session.checkpoint: memory allocation: Cannot allocate memory
2015-08-14T15:04:42.553+0000 E STORAGE  WiredTiger (12) [1439564682:551986][7195:0x7f86ccb52700], file:collection-4--6473704670305184383.wt, session.checkpoint: ignoring not-fatal error during parent page split: Cannot allocate memory
2015-08-14T15:04:42.583+0000 E STORAGE  WiredTiger (12) [1439564682:564580][7195:0x7f86ccb52700], checkpoint-server: checkpoint server error: Cannot allocate memory
2015-08-14T15:04:42.588+0000 E STORAGE  WiredTiger (-31804) [1439564682:588826][7195:0x7f86ccb52700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-08-14T15:04:42.604+0000 I -        Fatal Assertion 28558
2015-08-14T15:04:42.622+0000 I -        [conn14] Fatal Assertion 28559
2015-08-14T15:04:45.076+0000 I CONTROL
 0xf77ca9 0xf16b61 0xefb6e1 0xda193a 0x13aa2f9 0x13aa4b5 0x13aa954 0x133f3b3 0x7f86d8a89df3 0x7f86d753d1bd

Comment by Ramon Fernandez Marina [ 13/Aug/15 ]

vengireturns@gmail.com, I wanted to add a bit more information to this ticket.

The stack trace indicates that an attempt to allocate memory failed, which caused mongod to terminate (this is by design). In 3.0.4 there were a number of cases where WiredTiger would consume large amounts of memory, which we can trigger behaviors like the one you described in this ticket. MongoDB 3.0.5 shipped with fixes for this cases, but also included a performance enhancement that may also cause increased memory consumption. The relevant tickets are:

MongoDB 3.0.6-rc0 release candidate includes fixes for all three tickets, and we believe it should address the out-of-memory condition you're seeing – hence the suggestion to try it out. If you're open to sharing the data you're bulk-uploading with us we can run the experiments on our end – you can upload data privately and securely here.

If the out-of-memory condition reproduces with 3.0.6-rc0 then we may be looking at a new problem, and we'll ask you to collect some data to investigate further.

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 13/Aug/15 ]

vengireturns@gmail.com, this could be another instance of SERVER-19673. Could you please try this bulk upload with MongoDB 3.0.6-rc0? 3.0.6-rc0 contains a fix for SERVER-19673 and is available for download here.

Thanks,
Ramón.

Generated at Thu Feb 08 03:52:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.