[SERVER-14696] Periodic crash in mongod Created: 26/Jul/14  Updated: 10/Dec/14  Resolved: 26/Jul/14

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File crash.log    
Operating System: ALL
Participants:

 Description   

Version is 2.6.1. I noticed the primary and secondary of the replica set were switching every so often. Once I looked at the logs it was due to a crash. If you need any more info please let me know

 2014-07-26T08:37:43.669+0000 [initandlisten] can't create new thread, closing connection
2014-07-26T08:37:43.665+0000 [conn192269481] end connection 172.31.39.18:45989 (30574 connections now open)
2014-07-26T08:37:43.670+0000 [journal] mem info: vsize: 113139 resident: 7467 mapped: 39711
2014-07-26T08:37:43.672+0000 [conn192269480] end connection 172.31.39.19:37163 (30568 connections now open)
2014-07-26T08:37:43.672+0000 [initandlisten] connection accepted from 172.31.39.20:36061 #192269485 (30570 connections now open)
2014-07-26T08:37:43.673+0000 [initandlisten] connection accepted from 172.31.39.18:45993 #192269486 (30571 connections now open)
2014-07-26T08:37:43.674+0000 [initandlisten] connection accepted from 172.31.39.18:55162 #192269487 (30572 connections now open)
2014-07-26T08:37:43.674+0000 [initandlisten] connection accepted from 172.31.39.20:36065 #192269488 (30573 connections now open)
2014-07-26T08:37:43.678+0000 [conn192269421]  authenticate db: Lq4XRNsx { authenticate: 1, user: "Lq4XRNsx", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.698+0000 [initandlisten] connection accepted from 172.31.39.18:55169 #192269489 (30573 connections now open)
2014-07-26T08:37:43.706+0000 [conn192269452]  authenticate db: bC6BXris { authenticate: 1, user: "bC6BXris", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.706+0000 [conn192269454]  authenticate db: bC6BXris { authenticate: 1, user: "bC6BXris", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.707+0000 [conn192269459]  authenticate db: bC6BXris { authenticate: 1, user: "bC6BXris", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.707+0000 [conn192269460]  authenticate db: bC6BXris { authenticate: 1, user: "bC6BXris", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.720+0000 [conn192269424]  authenticate db: Lq4XRNsx { authenticate: 1, user: "Lq4XRNsx", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.730+0000 [initandlisten] connection accepted from 172.31.39.20:59382 #192269490 (30574 connections now open)
2014-07-26T08:37:43.739+0000 [conn192269486] end connection 172.31.39.18:45993 (30573 connections now open)
2014-07-26T08:37:43.751+0000 [initandlisten] connection accepted from 172.31.39.19:57001 #192269491 (30574 connections now open)
2014-07-26T08:37:43.780+0000 [conn192269427]  authenticate db: Lq4XRNsx { authenticate: 1, user: "Lq4XRNsx", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.780+0000 [conn192269431]  authenticate db: Lq4XRNsx { authenticate: 1, user: "Lq4XRNsx", nonce: "xxx", key: "xxx" }
2014-07-26T08:37:43.782+0000 [conn192266979] end connection 172.31.39.18:51929 (30573 connections now open)
2014-07-26T08:37:43.782+0000 [conn192266980] end connection 172.31.39.18:51930 (30572 connections now open)
2014-07-26T08:37:43.782+0000 [conn192266981] end connection 172.31.39.18:51931 (30571 connections now open)
2014-07-26T08:37:43.783+0000 [conn192266982] end connection 172.31.39.18:51932 (30570 connections now open)
2014-07-26T08:37:43.783+0000 [conn192266983] end connection 172.31.39.18:51933 (30569 connections now open)
2014-07-26T08:37:43.796+0000 [initandlisten] connection accepted from 172.31.39.21:54561 #192269492 (30570 connections now open)
2014-07-26T08:37:43.803+0000 [conn192266964] end connection 172.31.39.19:40506 (30569 connections now open)
2014-07-26T08:37:43.803+0000 [conn192266966] end connection 172.31.39.19:40512 (30568 connections now open)
2014-07-26T08:37:43.803+0000 [conn192266973] end connection 172.31.39.19:40517 (30567 connections now open)
2014-07-26T08:37:43.803+0000 [conn192266974] end connection 172.31.39.19:40518 (30566 connections now open)
2014-07-26T08:37:43.803+0000 [conn192266975] end connection 172.31.39.19:40519 (30565 connections now open)
2014-07-26T08:37:43.828+0000 [journal] SEVERE: Got signal: 6 (Aborted).
Backtrace:0xf36096 0xf35e70 0x7f034913efc0 0x7f034913ef49 0x7f0349140348 0xee973b 0xd16eda 0x9c899a 0x9c9107 0x9c970a 0xf6eb6c 0x7f034a52af18 0x7f03491ede0d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xf36096]
 /usr/bin/mongod() [0xf35e70]
 /lib64/libc.so.6(+0x33fc0) [0x7f034913efc0]
 /lib64/libc.so.6(gsignal+0x39) [0x7f034913ef49]
 /lib64/libc.so.6(abort+0x148) [0x7f0349140348]
 /usr/bin/mongod(_ZN5mongo16MemoryMappedFile16remapPrivateViewEPv+0xfb) [0xee973b]
 /usr/bin/mongod(_ZN5mongo17DurableMappedFile19remapThePrivateViewEv+0x2a) [0xd16eda]
 /usr/bin/mongod(_ZN5mongo3dur16REMAPPRIVATEVIEWEv+0x20a) [0x9c899a]
 /usr/bin/mongod() [0x9c9107]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x1da) [0x9c970a]
 /usr/bin/mongod() [0xf6eb6c]
 /lib64/libpthread.so.0(+0x7f18) [0x7f034a52af18]
 /lib64/libc.so.6(clone+0x6d) [0x7f03491ede0d]



 Comments   
Comment by Asya Kamsky [ 26/Jul/14 ]

It looks like you ran out of memory on the machine, it's not able to allocate heap space for new incoming connections:

2014-07-26T06:03:37.361+0000 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable

In addition when it tried to remap private view it couldn't allocate memory

2014-07-26T06:05:11.872+0000 [journal] ERROR: 13601 Couldn't remap private view: errno:12 Cannot allocate memory

You might check physical memory availability/usage on this host - I don't see any specific evidence of a bug here.

Asya

Comment by Asya Kamsky [ 26/Jul/14 ]

Are you using MMS so I can see various attributes for the host?

Comment by Dharshan Rangegowda [ 26/Jul/14 ]

Quick correction: It is not pure CentOS. It is amazon linux

Comment by Dharshan Rangegowda [ 26/Jul/14 ]

The log file from a crash in the primary is attached

Comment by Dharshan Rangegowda [ 26/Jul/14 ]

The crash is on the primary. Typically whichever machine becomes primary will crash in a short while. I will attach the crash logs shortly

The OS is CentOS. Here is the ulimit -a

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 136504
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Asya Kamsky [ 26/Jul/14 ]

You said primary and secondary were switching - were they both crashing? If so can you include logs from both?

What is the exact OS/environment that the crashing mongod is running in? Is it possible it's out of resources (what is ulimit -a on that system)?

Generated at Thu Feb 08 03:35:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.