[SERVER-3361] mongod crashes after starting with --master Created: 02/Jul/11  Updated: 29/Feb/12  Resolved: 31/Oct/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: James Simpson Assignee: Mathias Stearn
Resolution: Cannot Reproduce Votes: 0
Labels: crash, master, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 5.5, 64-bit


Operating System: Linux
Participants:

 Description   

I've been running MongoDB fine for the past week or so. I shut it down and tried to restart it with the following:

./mongod --dbpath /etc/mongodb/data --fork --logpath /var/log/ mongodb.log --logappend --quiet --journal --bind_ip 127.0.0.1 – nohttpinterface

However, the mongod process quickly disappeared and I had to restart without --master to get it running again. This is what I got in my log (my database has less than 500MB of data since I've just been doing testing with it, and the box it is on has 8GB of RAM):

Fri Jul 1 21:15:09 [initandlisten] waiting for connections on port 27017
Fri Jul 1 21:15:09 [initandlisten] master=true
Fri Jul 1 21:15:09 BackgroundJob starting: snapshot
Fri Jul 1 21:15:09 BackgroundJob starting: ClientCursorMonitor
Fri Jul 1 21:15:09 [initandlisten] ******
Fri Jul 1 21:15:09 [initandlisten] creating replication oplog of size: 20447MB...
Fri Jul 1 21:15:09 [initandlisten] create collection local.oplog.$main

{ size: 21440500326.4, capped: true, autoIndexId: false }

Fri Jul 1 21:15:09 [FileAllocator] allocating new datafile /etc/mongodb/data/local.ns, filling with zeroes...
Fri Jul 1 21:15:09 [FileAllocator] done allocating datafile /etc/mongodb/data/local.ns, size: 16MB, took 0.032 secs
Fri Jul 1 21:15:09 [FileAllocator] allocating new datafile /etc/mongodb/data/local.0, filling with zeroes...
Fri Jul 1 21:15:09 [FileAllocator] done allocating datafile /etc/mongodb/data/local.0, size: 64MB, took 0.155 secs
Fri Jul 1 21:15:09 [FileAllocator] allocating new datafile /etc/mongodb/data/local.1, filling with zeroes...
Fri Jul 1 21:15:10 [FileAllocator] done allocating datafile /etc/mongodb/data/local.1, size: 128MB, took 0.293 secs
Fri Jul 1 21:15:10 [FileAllocator] allocating new datafile /etc/mongodb/data/local.2, filling with zeroes...
Fri Jul 1 21:15:17 [FileAllocator] done allocating datafile /etc/mongodb/data/local.2, size: 2047MB, took 7.342 secs
Fri Jul 1 21:15:17 [FileAllocator] allocating new datafile /etc/mongodb/data/local.3, filling with zeroes...
Fri Jul 1 21:15:24 [FileAllocator] done allocating datafile /etc/mongodb/data/local.3, size: 2047MB, took 7.239 secs
Fri Jul 1 21:15:24 [FileAllocator] allocating new datafile /etc/mongodb/data/local.4, filling with zeroes...
Fri Jul 1 21:15:31 [FileAllocator] done allocating datafile /etc/mongodb/data/local.4, size: 2047MB, took 6.981 secs
Fri Jul 1 21:15:31 [FileAllocator] allocating new datafile /etc/mongodb/data/local.5, filling with zeroes...
Fri Jul 1 21:15:38 [FileAllocator] done allocating datafile /etc/mongodb/data/local.5, size: 2047MB, took 5.9 secs
Fri Jul 1 21:15:38 [initandlisten] ERROR: New namespace: local.oplog.$main
New namespace: local.system.namespaces
mmap private failed with out of memory. (64 bit build)
Fri Jul 1 21:15:38 [initandlisten] Assertion: 13636:createPrivateMap failed (look in log for error)
0x55f5aa 0x629030 0x62999c 0x7826aa 0x769d0b 0x76a02e 0x76a0d3 0x76a2a0 0x77c89a 0x77d553 0x717034 0x698d0f 0x8a70b6 0x8ad38c 0x8ada88 0x8b36bf 0x2b14ef4fe994 0x4e10c9
./mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
./mongod(_ZN5mongo8MongoMMF13finishOpeningEv+0x60) [0x629030]
./mongod(_ZN5mongo8MongoMMF6createESsRyb+0x6c) [0x62999c]
./mongod(_ZN5mongo13MongoDataFile4openEPKcib+0x13a) [0x7826aa]
./mongod(_ZN5mongo8Database7getFileEiib+0x17b) [0x769d0b]
./mongod(_ZN5mongo8Database8addAFileEib+0x2e) [0x76a02e]
./mongod(_ZN5mongo8Database12suitableFileEib+0x63) [0x76a0d3]
./mongod(_ZN5mongo8Database11allocExtentEPKcib+0x70) [0x76a2a0]
./mongod(_ZN5mongo13_userCreateNSEPKcRKNS_7BSONObjERSsPb+0x53a) [0x77c89a]
./mongod(_ZN5mongo12userCreateNSEPKcNS_7BSONObjERSsbPb+0x103) [0x77d553]
./mongod(_ZN5mongo11createOplogEv+0x624) [0x717034]
./mongod(_ZN5mongo16startReplicationEv+0x36f) [0x698d0f]
./mongod(_ZN5mongo6listenEi+0xb6) [0x8a70b6]
./mongod(_ZN5mongo14_initAndListenEiPKc+0x8dc) [0x8ad38c]
./mongod(_ZN5mongo13initAndListenEiPKc+0x18) [0x8ada88]
./mongod(main+0x5acf) [0x8b36bf]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x2b14ef4fe994]
./mongod(__gxx_personality_v0+0x3a1) [0x4e10c9]
Fri Jul 1 21:15:38 [initandlisten] exception in initAndListen std::exception: createPrivateMap failed (look in log for error), terminating
Fri Jul 1 21:15:38 dbexit:
Fri Jul 1 21:15:38 [initandlisten] shutdown: going to close listening sockets...
Fri Jul 1 21:15:38 [initandlisten] shutdown: going to flush diaglog...
Fri Jul 1 21:15:38 [initandlisten] shutdown: going to close sockets...
Fri Jul 1 21:15:38 [initandlisten] shutdown: waiting for fs preallocator...
Fri Jul 1 21:15:38 [initandlisten] shutdown: lock for final commit...
Fri Jul 1 21:15:38 [initandlisten] shutdown: final commit...
Fri Jul 1 21:15:59 [initandlisten] shutdown: closing all files...
Fri Jul 1 21:15:59 closeAllFiles() finished



 Comments   
Comment by Ian Whalen (Inactive) [ 31/Oct/11 ]

@james, I'm closing this as Cannot Reproduce. Please do reopen if you're continuing to experience these mongod crashes.

Comment by Mathias Stearn [ 19/Aug/11 ]

Are you still having trouble?

Comment by Mathias Stearn [ 10/Aug/11 ]

Do you know how many connections there were to that server? We've seen issues where linux allocates a full 8MB per connection even if we only use a few KB. If there are a lot of connections I'd suggest running "ulimt -s 1024" to reduce the stack size to 1MB. We do this automatically in 1.9/2.0 but it must be done manually in 1.8.2.

Comment by James Simpson [ 05/Jul/11 ]

overcommit_memory is 0 and overcommit_ratio is 50. This is running on a VDS, so it is possible that all the space shows as virtual space.

Comment by Eliot Horowitz (Inactive) [ 05/Jul/11 ]

Its 7gb of physical free, but your virtual space if highly used, and the os is probably not willing to overcommit.

Its a little hard to tell without seeing it as its crashing.

Can you add some swap space to test?

Also, can you send the content of:

/proc/sys/vm/overcommit_memory
/proc/sys/vm/overcommit_ratio

Comment by James Simpson [ 05/Jul/11 ]

What makes you say that? 7GB of free RAM isn't enough for MongoDB?

Comment by Eliot Horowitz (Inactive) [ 05/Jul/11 ]

It looks like there isn't enough ram for all the things running on the box.

Comment by James Simpson [ 05/Jul/11 ]

Here are the results from top (nothing showed in mongostat because it could never connect since mongod crashed after the above error happens):

top - 08:20:14 up 1 day, 18:57, 3 users, load average: 0.25, 0.10, 0.06
Tasks: 59 total, 1 running, 56 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8601600k total, 1078348k used, 7523252k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
5885 varnish 18 0 715m 69m 63m S 0.0 0.8 0:16.18 645m /usr/local/varnish/sbin/varnishd -P /var/run/varnish.pid -a :80 -T localhost:6082 -p thread_pools 3 -p thread_pool_min 80 -p thread_pool_max 3000 -p thread_pool_add_delay 2 -p sess_workspace 32768 -f /etc
1914 root 15 0 141m 49m 2868 S 0.0 0.6 0:02.11 91m /usr/bin/spamd -d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3 --max-spare=1
3152 root 15 0 141m 48m 1944 S 0.0 0.6 0:00.21 92m spamd child
1563 mysql 18 0 277m 22m 4968 S 0.0 0.3 0:04.71 254m /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --log-error=/var/lib/mysql/s1.leetmedia.com.err --pid-file=/var/lib/mysql/s1.leetmedia.com.pid
25614 root 18 0 73132 21m 1400 S 0.0 0.3 0:06.35 49m lfd - sleeping

Comment by Eliot Horowitz (Inactive) [ 05/Jul/11 ]

Yes - trying to see if something else is using ram, if there is a leak, etc...
mongostat + top would be quite helpful

Comment by James Simpson [ 03/Jul/11 ]

In the original details I state that the box has 8GB of RAM (the database itself is less than 500MB). It also has 480GB of free disk space. Is there anything else you are looking for?

Comment by Eliot Horowitz (Inactive) [ 03/Jul/11 ]

Can you send the stats for the box?
Seems like there is not enough physical memory.

Comment by James Simpson [ 02/Jul/11 ]

Just to add some more information, I've checked the logs this morning (having run without master all night, and my logs are full of a similar error (which has never shown in the logs before). It is being input into the logs every 60 seconds.

Sat Jul 2 10:04:03 [dur] lsn set 228767
Sat Jul 2 10:04:26 [conn107] ERROR: mmap private failed with out of memory. (64 bit build)
Sat Jul 2 10:04:26 [conn107] Assertion: 13636:createPrivateMap failed (look in log for error)
0x55f5aa 0x629030 0x62999c 0x7826aa 0x769d0b 0x76a392 0x76c179 0x76c98b 0x7645a6 0x764dbe 0x7e6973 0x7dd844 0x7de8f1 0x647e45 0x64b3de 0x7547a5 0x759ec8 0x8a8fce 0x8bb630 0x2b225968e73d
./mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
./mongod(_ZN5mongo8MongoMMF13finishOpeningEv+0x60) [0x629030]
./mongod(_ZN5mongo8MongoMMF6createESsRyb+0x6c) [0x62999c]
./mongod(_ZN5mongo13MongoDataFile4openEPKcib+0x13a) [0x7826aa]
./mongod(_ZN5mongo8Database7getFileEiib+0x17b) [0x769d0b]
./mongod(_ZN5mongo8Database12openAllFilesEv+0x22) [0x76a392]
./mongod(_ZN5mongo8DatabaseC1EPKcRbRKSs+0x459) [0x76c179]
./mongod(_ZN5mongo14DatabaseHolder11getOrCreateERKSsS2_Rb+0x5bb) [0x76c98b]
./mongod(_ZN5mongo6Client7Context11_finishInitEb+0x516) [0x7645a6]
./mongod(_ZN5mongo6Client7ContextC1ERKSsSsPNS_9mongolockEb+0x7e) [0x764dbe]
./mongod(_ZN5mongo16CmdListDatabases3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x1a3) [0x7e6973]
./mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x724) [0x7dd844]
./mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7de8f1]
./mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x647e45]
./mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x324e) [0x64b3de]
./mongod [0x7547a5]
./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b8) [0x759ec8]
./mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
./mongod(thread_proxy+0x80) [0x8bb630]
/lib64/libpthread.so.0 [0x2b225968e73d]

Comment by James Simpson [ 02/Jul/11 ]

Yes, sorry. What I was trying to say was that I had been running it for the past week without --master, and this is what happens when I try to start it with --master.

Comment by Eliot Horowitz (Inactive) [ 02/Jul/11 ]

Had you been running without --master before?

Generated at Thu Feb 08 03:02:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.