[SERVER-14155] Segmentation fault Created: 04/Jun/14  Updated: 10/Dec/14  Resolved: 17/Jun/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: qing cao Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.3


Attachments: File log.tar.gz     Text File mongod.log    
Operating System: Linux
Participants:

 Description   

An error occurred in running after a period of time in a secondary.



 Comments   
Comment by Ramon Fernandez Marina [ 17/Jun/14 ]

caoqing we're going to mark this issue as resolved. Feel free to reopen if it happens again after removing the nproc limitation.

Comment by qing cao [ 17/Jun/14 ]

I found i had set a wrong parameter by mistake.
This is my original limits file "/etc/security/limit.d/90-nproc.conf"

  • soft noproc 32000
  • hard noproc 32000

After i fixed it the error had never occurred.

  • soft nproc 32000
  • hard nproc 32000

But I don't know why I have came across this issue.
I checked that the max process put in place limitation of 32000 all the time.

cat /proc/`pidof mongod`/limits|grep -i "max processes"
Max processes 32000 32000 processes

Comment by Ramon Fernandez Marina [ 05/Jun/14 ]

Looking at the logs I get the impression that there are too many connection attempts to the secondary, and some OS limit is being hit (e.g.: number of file descriptors, memory allocation). It would be useful to see the mongod.log file for the last few hours/days leading up to this error. If the error happens soon after restarting MongoDB on the secondary, then please proceed as follows:

  • stop MongoDB on the secondary
  • remove mongod.log
  • re-start MongoDB

and send us the resulting mongod.log file when the error happens. The output of ulimit -a could be useful as well.

Note that your logs also contain the following:

possible SYN flooding on port 27017. Sending cookies.

The SYN flooding message might be legitimate, but could also indicate a DOS attack. Is it possible for you to post the output of netstat -a?

Comment by qing cao [ 05/Jun/14 ]

Hi,
This issue has brought a lot of trouble to me now.
I've uploaded the log information onto the attachments.
It occurs frequently, but simply on a secondary.
Here is a live configuration on one of my replica sets:
rs_main:PRIMARY> rs.conf()
{
"_id" : "rs_main",
"version" : 18,
"members" : [

{ "_id" : 2, "host" : "dal05mgo05.sl.dx:27017", "priority" : 60 }

,

{ "_id" : 3, "host" : "dal07mgo02.sl.dx:27017" }

,

{ "_id" : 6, "host" : "dal05mgo08.sl.dx:27017" }

]
}

Comment by J Rassi [ 04/Jun/14 ]

Hi,

I notice the following from the log except you posted:

src/third_party/gperftools-2.0/src/central_freelist.cc:322] tcmalloc: allocation failed 32768 

Sat May 31 04:39:17.109 [conn1888120] Assertion: 16070:out of memory BufBuilder::grow_reallocate

I conclude that the the mongod process failed to allocate memory, likely due to an OOM condition in the OS. The next step would be to attempt to determine whether the cause of the issue is a novel memory leak in the server.

  • Is this issue reproducible? If so, have you encountered it on any other replica set member before?
  • Can you post the output of running the following on both this replica set member and the primary member?
    • dmesg | tail -100
    • top -b -n 1
    • free
    • mongostat -n 10

~ Jason Rassi

Generated at Thu Feb 08 03:34:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.