[SERVER-6668] Mongos crashes consistently under high load for various reasons Created: 31/Jul/12  Updated: 08/Mar/13  Resolved: 31/Aug/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Travis Reeder Assignee: Asya Kamsky
Resolution: Cannot Reproduce Votes: 1
Labels: crash, failure, linux, mongos, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04, aws, m1.xlarge instance types


Operating System: Linux
Participants:

 Description   

Mongos consistently crashes under high loads (~2500 concurrent requests). This is easily reproducible on our system. Stack traces and logs at links below.

1) First crash stack traces, two out of three mongos died at the same time.
https://gist.github.com/a70ccfbdd5e58430f68c

2) mongos log: http://sprunge.us/OEiU - we noticed in this one: "ERROR: Out of file descriptors. Waiting one second before trying to accept more connections." So we increased the file descriptors and that one went away for now. Regardless, mongos should not crash like this.

Somewhere in here, we increased the number of mongos from 3 to 12 and it seemed to work a bit better, but still easy to make it crash.

3) And this just goes on and on, we'll restart them and easily kill them with a bit of load.

http://sprunge.us/IfUR
http://sprunge.us/gFSS
http://sprunge.us/dLCj
http://sprunge.us/XSOb
http://sprunge.us/TZZD
http://sprunge.us/ZNUO



 Comments   
Comment by Asya Kamsky [ 06/Aug/12 ]

Are you still seeing this problem with larger number of mongos processes? I haven't been able to reproduce the behavior you're seeing.

Comment by Eliot Horowitz (Inactive) [ 01/Aug/12 ]

That should be enough - but unlimited would be better, and doesn't cause issues.

Comment by Travis Reeder [ 31/Jul/12 ]

mongodb@ip-10-115-87-10:/home/ubuntu$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 55459
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 55459
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Travis Reeder [ 31/Jul/12 ]

We increased them to 65535. Is that sufficient?

Comment by Eliot Horowitz (Inactive) [ 31/Jul/12 ]

What are the file descriptor and process limits?
Looks like they might be too low.

Generated at Thu Feb 08 03:12:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.