[SERVER-13889] crash because of "mmap failed with out of memory " Created: 09/May/14  Updated: 10/Dec/14  Resolved: 27/Jun/14

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Allen K Assignee: Kevin Pulo
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I use mongo replicaset but suddenly it crashed. when attempt to start up with command "./mongod --dbpath=../data --replSet replset_BRG --port 28010 --logpath=../log/log --logappend --fork --directoryperdb", error occurs as follow:

***** SERVER RESTARTED *****
 
 
Fri May  9 16:29:45 [initandlisten] MongoDB starting : pid=25550 port=28010 dbpath=/home/work/mongo-server/bin/../data 64-bit host=
 
Fri May  9 16:29:45 [initandlisten] db version v2.2.1, pdfile version 4.5
Fri May  9 16:29:45 [initandlisten] git version: d6764bf8dfe0685521b8bc7b98fd1fab8cfeb5ae
Fri May  9 16:29:45 [initandlisten] build info: Linux domU-12-31-39-16-30-A2 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64
BOOST_LIB_VERSION=1_49
Fri May  9 16:29:45 [initandlisten] options: { dbpath: "../data", directoryperdb: true, fork: true, logappend: true, logpath: "../log/log"
, port: 28010, replSet: "replset_BRG" }
Fri May  9 16:29:45 [initandlisten] journal dir=/home/work/mongo-server/bin/../data/journal
Fri May  9 16:29:45 [initandlisten] recover begin
Fri May  9 16:29:45 [initandlisten] recover lsn: 0
Fri May  9 16:29:45 [initandlisten] recover /home/work/mongo-server/bin/../data/journal/j._0
Fri May  9 16:29:45 [initandlisten] recover cleaning up
Fri May  9 16:29:45 [initandlisten] removeJournalFiles
Fri May  9 16:29:45 [initandlisten] recover done
Fri May  9 16:29:45 [initandlisten] waiting for connections on port 28010
Fri May  9 16:29:45 [websvr] admin web console waiting for connections on port 29010
Fri May  9 16:29:45 [initandlisten] connection accepted from 10.46.23.36:23864 #1 (1 connection now open)
Fri May  9 16:29:45 [rsStart] replSet I am *:28010
Fri May  9 16:29:45 [rsStart] replSet STARTUP2
Fri May  9 16:29:45 [rsHealthPoll] replSet member*:28010 is up
Fri May  9 16:29:45 [rsHealthPoll] replSet member *:28010 is now in state PRIMARY
Fri May  9 16:29:45 [rsHealthPoll] replSet member *:28010 is up
Fri May  9 16:29:45 [rsHealthPoll] replSet member *:28010 is now in state ARBITER
Fri May  9 16:29:46 [rsSync] replSet still syncing, not yet to minValid optime 536b0dc8:1
Fri May  9 16:29:47 [initandlisten] connection accepted from 10.42.152.12:56039 #2 (2 connections now open)
Fri May  9 16:29:47 [conn2] end connection *:56039 (1 connection now open)
Fri May  9 16:29:47 [initandlisten] connection accepted from 10.42.152.12:56040 #3 (2 connections now open)
Fri May  9 16:29:47 [initandlisten] connection accepted from 10.23.240.150:59369 #4 (3 connections now open)
Fri May  9 16:29:47 [initandlisten] connection accepted from 10.23.240.150:59374 #5 (3 connections now open)
Fri May  9 16:29:51 [rsBackgroundSync] replSet syncing to: 10.42.152.12:28010
Fri May  9 16:29:51 [rsSync] replSet still syncing, not yet to minValid optime 536b0dc8:1
Fri May  9 16:29:53 [conn5] end connection *:59374 (2 connections now open)
Fri May  9 16:29:53 [initandlisten] connection accepted from 10.23.240.150:21190 #6 (3 connections now open)
Fri May  9 16:29:56 [rsSyncNotifier] replset setting oplog notifier to 9.42.152.12:28010
Fri May  9 16:29:56 [repl prefetch worker] ERROR:   mmap() failed for /home/work/mongo-server/bin/../data/bridge/bridge.59 len:2146435072
errno:12 Cannot allocate memory
Fri May  9 16:29:56 [repl prefetch worker] ERROR: mmap failed with out of memory. (64 bit build)
Fri May  9 16:29:56 [repl writer worker 1] ERROR:   mmap() failed for /home/work/mongo-server/bin/../data/bridge/bridge.59 len:2146435072
errno:12 Cannot allocate memory
Fri May  9 16:29:56 [repl writer worker 1] ERROR: mmap failed with out of memory. (64 bit build)
Fri May  9 16:29:56 [repl writer worker 1] ERROR: writer worker caught exception: can't map file memory on: { ts: Timestamp 1399524808000|
1, h: 1490830305013686414, v: 2, op: "i", ns: "bridge.track", o: { _id: ObjectId('536b0dc8e662b15490342d01'), bid: 515059940390023161, sit
e: 2758320, type: 0, tm: 1399523870, date: 20140508, bdid: "BD1BCA60A411EBF3126DAAFFA7450E0B:FG=1", wt: 6, ref: "http://.com
/brgview/?p=010b5fb9f4c42c2ea54a04de2716c98e52c8260578b2ff7d6ce067637d4b7aeef66afd58f0b9a1d5b91f2243d7e8ed138f64097a5166140bc...", url: "h
ttp://www.55599120.com/", title: "[1m~S~H灏~T婊ㄦ~@х~W~E[1m~L婚~Y[1m~S~H灏~T婊ㄦ不[1m~V~W[1m~@х~W~E[1m~\~@濂界~Z~D[1m~L婚~Y榛~Q榫~Y姹~_[1m~\~A[1m~E[33;1H~I娑~H[1m~X叉~@婚~X~_[1m~L婚~Y~@~P[1m~E[1m~K[1m~T茬骇[1m~L婚~Y~@~Q" } }
Fri May  9 16:29:56 [repl writer worker 1]   Fatal Assertion 16360
0x9a28c1 0x967ea3 0x845e7e 0x97614d 0x9e82f9 0xacba3d 0xb91039
0x9a28c1 0x967ea3 0x845e7e 0x97614d 0x9e82f9 0xacba3d 0xb91039
 [0x9a28c1]
 [0x967ea3]
 [0x845e7e]
 [0x97614d]
 [0x9e82f9]
 [0xacba3d]
 [0xb91039]
Fri May  9 16:29:56 [repl writer worker 1]
 
***aborting after fassert() failure
 
 
Fri May  9 16:29:56 Got signal: 6 (Aborted).
Fri May  9 16:29:56 Backtrace:
0x9a28c1 0x400c69 0xacfc00 0xacfbcd 0xb624a0 0x967ede 0x845e7e 0x97614d 0x9e82f9 0xacba3d 0xb91039
 [0x9a28c1]
 [0x400c69]
 [0xacfc00]
 [0xacfbcd]
 [0xb624a0]
 [0x967ede]
 [0x845e7e]
 [0x97614d]
 [0x9e82f9]
 [0xacba3d]
 [0xb91039]

memory and disk space is enough:

 free -m
             total       used       free     shared    buffers     cached
Mem:         64350      64322         27          0         98      56113
-/+ buffers/cache:       8110      56240
Swap:          996        974         22

df -hl
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             8.7G  4.3G  4.5G  49% /
/dev/sda3             1.9T  975G  938G  51% /home
 



 Comments   
Comment by Thomas Rueckstiess [ 27/Jun/14 ]

Hi Allen,

We haven't heard back from you so I'm assuming this is no longer an issue for you. If you are still seeing the issue after upgrading to a more recent OS version, feel free to re-open the ticket.

Regards,
Thomas

Comment by Kevin Pulo [ 30/May/14 ]

Hi Allen,

Have you had a chance to upgrade your systems yet, or test a more recent distribution? Are you still seeing this problem on RHEL/CentOS 5 or 6?

Best regards,
Kev

Comment by Kevin Pulo [ 19/May/14 ]

Hi Allen,

I apologise for the delay getting back to you. I notice that your operating system is extremely old - RHEL/CentOS 4.3 is over 8 years old now. In the past we have had reports of similar problems on such old distributions. Can you try upgrading to a more recent version of RHEL/CentOS, such as version 5.10 or 6.5? If you can reproduce this problem after upgrading, or on another host that is running a recent distribution, please let us know and we can continue investigating.

Best regards,
Kev

Comment by Allen K [ 14/May/14 ]

hi Thomas,
I wonder if this issue would be paid attention to, thanks a lot.
Allen.

Comment by Allen K [ 09/May/14 ]

cat /proc/sys/vm/overcommit_memory
0

Comment by Allen K [ 09/May/14 ]

Hi Thomas Rueckstiess,
my os info is:
uname -a
Linux cq01-cq01 2.6.9_5-9-0-0 #1 SMP Wed Jun 23 14:03:19 CST 2010 x86_64 x86_64 x86_64 GNU/Linux
thanks for your help,
Allen K

Comment by Allen K [ 09/May/14 ]

ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 30720
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Thomas Rueckstiess [ 09/May/14 ]

Hi Allen,

Can you please attach the output of ulimit -a ?

Thanks,
Thomas

Generated at Thu Feb 08 03:33:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.